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ABSTRACT 

Mapping relational databases to RDF is a fundamental problem 
for the development of the Semantic Web. We present a solution, 
inspired by draft methods defined by the W3C where relational 
databases are directly mapped to RDF and OWL. Given a relational 
database schema and its integrity constraints, this direct mapping 
produces an OWL ontology, which, provides the basis for generat- 
ing RDF instances. The semantics of this mapping is defined using 
Datalog. Two fundamental properties are information preservation 
and query preservation. We prove that our mapping satisfies both 
conditions, even for relational databases that contain null values. 
We also consider two desirable properties: monotonicity and se- 
mantics preservation. We prove that our mapping is monotone and 
also prove that no monotone mapping, including ours, is semantic 
preserving. We realize that monotonicity is an obstacle for seman- 
tic preservation and thus present a non-monotone direct mapping 
that is semantics preserving. 

Categories and Subject Descriptors 

H. 2.5 [Heterogeneous Databases] : Data translation; H.3.5 [Online 
Information Services]: Web-based services 

Keywords 

Relational Databases, Semantic Web, Direct Mapping, RDB2RDF, 
SQL, SPARQL, RDF, OWL 

I. INTRODUCTION 

In this paper, we study the problem of directly mapping a relational 
database to an RDF graph with OWL vocabulary. A direct mapping 
is a default and automatic way of translating a relational database 
to RDF. One report suggests that Internet accessible databases con- 
tained up to 500 times more data compared to the static Web and 
roughly 70% of websites are backed by relational databases, mak- 
ing automatic translation of relational database to RDF central to 
the success of the Semantic Web 1131 . 

We build on an existing direct mapping of relational database 
schema to OWL DL (22) and the current draft of the W3C Direct 
Mapping standard (5). We study two properties that are fundamen- 
tal to a direct mapping: information preservation and query preser- 
vation. Additionally we study two desirable properties: monotonic- 
ity and semantics preservation. To the best of our knowledge, we 
are presenting the first direct mapping from a relational database 



to an RDF graph with OWL vocabulary that has been thoroughly 
studied with respect to these fundamental and desirable properties. 

Information preservation speaks to the ability of reconstructing 
the original database from the result of the direct mapping. Query 
preservation means that every query over a relational database can 
be translated into an equivalent query over the result of the direct 
mapping. Monotonicity is a desired property because it assures 
that a re-computation of the entire mapping is not needed after any 
updates to the database. Finally, a direct mapping is semantics pre- 
serving if the satisfaction of a set of integrity constraints are en- 
coded in the mapping result. 

Our proposed direct mapping is monotone, information preserv- 
ing and query preserving even in the general and practical scenario 
where relational databases contain null values. However, given a 
database that violates an integrity constraint, our direct mapping 
generates a consistent RDF graph, hence, it is not semantics pre- 
serving. 

We analyze why our direct mapping is not semantics preserving 
and realize that monotonicity is an obstacle. We first show that 
if we only consider primary keys, we can still have a monotone 
direct mapping that is semantics preserving. However this result is 
not sufficient because it dismisses foreign keys. Unfortunately, we 
prove that no monotone direct mapping is semantics preserving if 
foreign keys are considered, essentially because the only form of 
constraint checking in OWL is satisfiability testing. This result has 
an important implication in real world applications: if you migrate 
your relational database to the Semantic Web using a monotone 
direct mapping, be prepared to experience consistency when what 
one would expect is inconsistency. 

Finally, we present a non-monotone direct mapping that over- 
comes the aforementioned limitation, We foresee the existence of 
monotone direct mappings if OWL is extended with the epistemic 
operator. 

2. PRELIMINARIES 

In this section, we define the basic terminology used in the paper. 

2.1 Relational databases 

Assume, a countably infinite domain D and a reserved symbol 
NULL that is not in D. A schema R is a finite set of relation names, 
where for each JSeR, att(R) denotes the nonempty finite set of 
attributes names associated to R. An instance / of R assigns to 
each relation symbol R £ R a finite set R 1 = {ti, . . . , tc} of tu- 
ples, where each tuple tj (1 < j ' < t) is a function that assigns 
to each attribute in att(R) a value from (D U {NULL}). We use 
notation t. A to refer to the value of a tuple t in an attribute A. 

Relational algebra: To define some of the concept studied in this 
paper, we use relational algebra as a query language for relational 



databases. Given that we consider relational databases containing 
null values, we present in detail the syntax and semantics of a ver- 
sion of relational algebra that formalizes the way nulls are treated 
in practice in database systems. Formally, assume that R is a rela- 
tional schema. Then a relational algebra expression p over R and 
its set of attributes att(p) are recursively defined as follows: 

1. If tp = R with R G R, then p is a relational algebra expression 
over R such that att(p) = att(R). 

2. If tp = NULLa, where A is an attribute, then p is a relational 
algebra expression over R such that att(tp) = {A}. 

3. If ip is a relational algebra expression over R, A € att(tf)), 
a G D and tp is any of the expressions aA=a(ip), CA^ai'tp), 
o-isnuii(A)(VO or o-i sNot Nuii(A)(V , ) I then <P is a relational alge- 
bra expression over R such that att(p) — att(tp). 

4. If tp is a relational algebra expression over R, U C att(ip) and 
V = ftu(ip), then 73 is a relational algebra expression over R 
such that att(tp) = t/. 

5. If -0 is a relational algebra expression over R, A £ att(ip), B 
is an attribute such that B att(tp) and y? = <$a->b(^>), then 
<y3 is a relational algebra expression over R such that att(<p) = 
(att(ip) s {A}) U{B}. 

6. If ?/>i, tpi are relational algebra expressions over R and tp — 
(-01 cxi 02), then y? is a relational algebra expression over R 
such that att(ip) = (att(tpi) U att(ip2)). 

7. If i/>i, 02 are relational algebra expressions over R such that 
att(0i) = att (02) and <^ is either (0i U 02) or (0i \ 02), then 
ip is a relational algebra expression over R such that att(tp) — 
att(ipi). 

Let R be a relational schema, 7 an instance of R and tp a relational 
algebra expression over R. The evaluation of <p over 7, denoted by 
[ys]/, is defined recursively as follows: 

1. If p = R with R G R, then [y?]/ = R 1 . 

2. If tp — NULLa, where A is an attribute, then \tp\i = {t}, where 
t : {A} -¥ (D U {NULL}) is a tuple such that t.A = NULL. 

3. Let be a relational algebra expression over R, A G att(tp) 
and a G D. If tp = a A = a {tp), then [y?]/ = {t G [0]/ | 
t.A = a}. If tp = A ±aty), then [y?]/ ={(£ [0>]i | t.A / 
NULL and t.A / a}. If <p = c IsNu11 (a) (0), then [y?]/ = {(6 
[0]i I t.A = null}. If tp = cr IsNotNull(A) (V0, then [<^]/ = 
{t G [01/ I t.A ± NULL}. 

4. If -0 is a relational algebra expression over R, U C att(ip) and 
<y3 = itu(i>), then [y>]/ ={(:(/-» (D U {NULL}) | there 
exists t' G [01/ such that for every A € U: t.A = i'.A}. 

5. If is a relational algebra expression over R, yl £ att(ip), B 
is an attribute such that 73 att(ip) and yp = 8a^b(iP), then 
[y?]/ = {t : att(<£>) ->(DU {NULL}) | there exists t' £ {tpji 
such that t. B = t '.A and for every C G (att(tp) \ {7?}): £.C = 
t'.C}. 

6. If ipi, 02 are relational algebra expressions over R and tp — 
(-01 IX] V2), then l<p}i = {t : att(y3) -> (DU {NULL}) | 
there exist ti G [-0i]/ and ti G [-02]/ such that for every A G 
(att(ipi) n att(0 2 )): t.A = ti.-A = i2-A / NULL, for every 
A G (att(ipi) x att(ip2)): t.A = h.A, and for every A G 
(att(i} 2 ) n att(ipi)): t.A = t 2 .A}. 

7. Let -01, -02 be relational algebra expressions over R such that 
att(^i) = att(0 2 ). If tp = (0i U V2), then [y?]/ = [^ij/ U 
[0 2 ]/. If tp = (0! x V2), then [y,]/ = M/ \ [V2I/. 



It is important to notice that the operators left-outer join, right-outer 
join and full-outer join are all expressible with the previous opera- 
tors. For more details, we refer the reader to the Appendix. 

Integrity constraints: We consider two types of integrity con- 
straints: keys and foreign keys. Let R be a relational schema. A 
key p over R is an expression of the form i?[Ai, . . . , A m ], where 
R G R and C {Ai, . . . , A m } C att(R). Given an instance 7 of 
R, 7 satisfies key tp, denoted by 7 |= tp, if: (1) for every t G R 1 
and k G {1, ... , m}, it holds that t.A k / NULL, and (2) for ev- 
ery ti,ta G R 1 , if ti.Afc = £2-Afc for every fc G {1, . . . , m}, 
then ii = £2- A foreign key over R is an expression of the form 
7?[Ai,...,A m ] C FK S[Bi,...,B m ], where 7?, S G R, C 
{Ai,...,A m } C att(7?) and C {Si, ...,S m } C att(S). 
Given an instance 7 of R, 7 satisfies foreign key tp, denoted by 
7 |= tp, if I |= 5[7?i, . . . , 7? m ] and for every tuple f in R 1 : ei- 
ther (1) there exists k G {1, . . . , m} such that t.A^ = NULL, or 
(2) there exists a tuple s in S 1 such that t.Ak = s.Bk for every 
fc G {1, . . . , m}. 

Given a relational schema R, a set E of keys and foreign keys 
is said to be a set of primary keys ( PKs ) and foreign keys ( FKs) 
over R if: (1) for every tp G E, it holds that tp is either a key or a 
foreign key over R, and (2) there are no two distinct keys in E of 
the form 7Z[Ai, . . . , A m ] and R[B\, . . . , B n ] (that is, that mention 
the same relation name 7?). Moreover, an instance 7 of R satisfies 
E, denoted by 7 |= E, if for every tp G E, it holds that I \= p. 

2.2 RDF and OWL 

Assume there are pairwise disjoint infinite sets I (IRIs), B (blank 
nodes) and (literals). A tuple (s,p,o) G (I U B) x I x (I U B U ) is 
called an RDF triple, where s is the subject, p is the predicate and 
o is the object. A finite set of RDF triples is called an RDF graph. 
Moreover, assume the existence of an infinite set V of variables 
disjoint from the above sets, and assume that every element in V 
starts with the symbol ?. 

In this paper, we consider RDF graphs with OWL vocabulary (T|, 
which is the W3C standard ontology language based on description 
logics, without datatypes. In particular, we say that an RDF graph 
G is consistent under OWL semantics if a model of G with respect 
to the OWL vocabulary exists (see [I] for a precise definition of the 
notion of model and the semantics of OWL). 

2.3 SPARQL 

In this paper, we use SPARQL as a query language for RDF 
graphs. The official syntax of SPARQL |17| 1121 considers oper- 
ators OPTIONAL, UNION, FILTER, SELECT, AS and concatena- 
tion via a point symbol ( . ), to construct graph pattern expressions. 
The syntax of the language also considers { } to group patterns, 
and some implicit rules of precedence and association. In order 
to avoid ambiguities in the parsing, we follow the approach pro- 
posed in 1161 . and we present the syntax of SPARQL graph patterns 
in a more traditional algebraic formalism, using operators AND 
(.), UNION (UNION), OPT (OPTIONAL), MINUS (MINUS), 

FILTER (FILTER), SELECT (SELECT) and AS (AS). More 
precisely, a SPARQL graph pattern expression is defined recur- 
sively as follows. 

1. { } is a graph pattern (the empty graph pattern). 

2. A tuple from (I U U V) x (I U V) x (I U U V) is a graph pattern 
(a triple pattern). 

3. If Pi and P2 are graph patterns, then expressions (Pi AND P2), 
(Pi OPT P 2 ), (Pi UNION P 2 ) and (Pi MINUS P 2 ) are 
graph patterns. 



4. If P is a graph pattern and R is a SPARQL built-in condition, 
then the expression (P FILTER R) is a graph pattern. 

5. If P is a graph pattern and ?A U . . ., ?A m , IBi, . . ., ?B m , ?Ci, 
. . ., ?C n is a sequence of pairwise distinct elements from V 
(m > and n > 0) such that none of the variables IBi (1 < 
i < m) is mentioned in P, then 

(SELECT {?A 1 AS ?Bi, . . . , ?A m AS ?B m , ?Ci, .. . , ?C„} P) 

is a graph pattern. 

A SPARQL built-in condition is constructed using elements of the 
set (I U V) and constants, logical connectives (-i, A, V), inequality 
symbols (<, <, >, >), the equality symbol (=), unary predicates 
such as bound, isBlank, and isIRI (see 1171 1121 for a complete 
list). In this paper, we restrict to the fragment where the built-in 
condition is a Boolean combination of terms constructed by us- 
ing = and bound, that is: (1) if IX, 7Y G V and c € I, then 
bound(?X), IX — c and IX —?Y are built-in conditions, and 
(2) if Ri and R2 are built-in conditions, then (-1.R1), (Ri V R2) 
and (Ri A R2) are built-in conditions. 

The version of SPARQL used in this paper includes the follow- 
ing SPARQL 1.1 features: the operator MINUS, the possibility of 
nesting the SELECT operator and the operator AS fT2l . 

The answer of a SPARQL query P over an RDF graph G is a 
finite set of mappings, where a mapping fi is a partial function from 
the set V of variables to (I U U B). We define the semantics of 
SPARQL as a function [• ]g that, given an RDF graph G, takes a 
graph pattern expression and returns a set of mappings. We refer 
the reader to the Appendix for more detail. 

3. DIRECT MAPPINGS: DEFINITION AND 
PROPERTIES 

A direct mapping is a default way to translate relational databases 
into RDF (without any input from the user on how the relational 
data should be translated). The input of a direct mapping A4 is a 
relational schema R, a set E of PKs and FKs over R and an in- 
stance I of R. The output is an RDF graph with OWL vocabulary. 

Assume Q is the set of all RDF graphs and 1ZC is the set of all 
triples of the form (R, E, /) such that R is a relational schema, E 
is a set of PKs and FKs over R and / is an instance of R. 

Definition 1 (Direct mapping) A direct mapping M is a total func- 
tion from 1ZC to Q. 

We now introduce two fundamental properties of direct mappings: 
information preservation and query preservation; and two desirable 
properties of these mappings: monotonicity and semantic preserva- 
tion. Information preservation is a fundamental property because it 
guarantees that the mapping does not lose information, which is 
fundamental in an Extract-Transform-Load process. Query preser- 
vation is also a fundamental property because it guarantees that 
everything that can be extracted from the relational database by a 
relational algebra query, can also be extracted from the resulting 
RDF graph by a SPARQL query. This property is fundamental for 
workloads that involve translating SPARQL to SQL. Monotonic- 
ity is a desirable property because it would avoid recalculating the 
mapping for the entire database after inserting new data. In ad- 
dition to practical considerations when translating relational data 
to RDF graphs, we must deal with the closed-world database se- 
mantics and open world RDF/OWL semantics. Understanding the 
expressive power of a mapping and, its ability to properly deal with 
integrity constraints is important. Thus our choice of examining 
semantics preservation. 



3.1 Fundamental properties 

Information preservation: A direct mapping is information pre- 
serving if it does not lose any information about the relational in- 
stance being translated, that is, if there exists a way to recover the 
original database instance from the RDF graph resulting from the 
translation process. Formally, assuming that X is the set of all pos- 
sible relational instances, we have that: 

Definition 2 (Information preservation) A direct mapping M. is 
information preserving if there is a computable mapping M : Q — > 
X such that for every relational schema R, set E of PKs and FKs 
over R, and instance I ofR satisfying E: AT(M(R., E, /)) = I. 

Recall that a mapping M : Q X is computable if there exists an 
algorithm that, given G G G, computes J\f(G). 

Query preservation: A direct mapping is query preserving if ev- 
ery query over a relational database can be translated into an equiv- 
alent query over the RDF graph resulting from the mapping. That 
is, query preservation ensures that every relational query can be 
evaluated using the mapped RDF data. 

To formally define query preservation, we focus on relational 
queries that can be expressed in relational algebra ||3j and RDF 
queries that can be expressed in SPARQL 1 17 16 1. In Section [2~Tl 
we introduced a version of relational algebra that formalizes the 
semantics of null values in practice. In Section |2~3l we introduce 
an algebraic version of SPARQL that follows the approach pro- 
posed in |16| . Given the mismatch in the formats of these query 
languages, we introduce a function tr that converts tuples returned 
by relational algebra queries into mappings returned by SPARQL. 
Formally, given a relational schema R, a relation name R G R, 
an instance I of R and a tuple t G R 1 , define tr(t) as the map- 
ping jj, such that: (1) the domain of /1 is {?A | A G att(R) 
and t.A / NULL}, and (2) n{7A) = t.A for every A in the do- 
main Of jU. 

Example 1 Assume that a relational schema contains a relation 
name STUDENT and attributes ID, NAME and AGE. Moreover, as- 
sume that t is a tuple in this relation such that t.ID = 1, t.NAME = 
John and t.AGE = NULL. Then, tr(t) = p, where the domain of fi 
is{?ID, ?NAME}, /u(?ID) = 1 and /x(?NAME) = John. □ 

Definition 3 (Query preservation) A direct mapping M. is query 
preserving if for every relational schema R, set E of PKs and 
FKs over R and relational algebra query Q over R, there exists 
a SPARQL query Q* such that for every instance IofH. satisfying 

tr(lQ}i) = [<9*1ai(r,e,/)- 
It is important to notice that information preservation and query 
preservation are incomparable properties in our setting. On one 
side, if a direct mapping M is information preserving, this does 
not guarantee that every relational algebra query Q can be rewritten 
into an equivalent SPARQL query over the translated data, as M. 
could transform source relational databases in such a way that a 
more expressive query language is needed to express Q over the 
generated RDF graphs. On the other side, a mapping M can be 
query preserving and not information preserving if the information 
about the schema of the relational database being translated is not 
stored. For example, we define in Section|4]a direct mapping VA4 
that includes information about these relational schemas. It will 
become clear in Sections [4] and [5] that if such information is not 
stored, then VA4 would be query preserving but not information 
preserving. 

3.2 Desirable properties 

Monotonicity: Given two database instances Ii and I2 over a rela- 
tional schema R, instance Ii is said to be contained in instance I2, 



denoted by h Q h, if for every R 6 R, it holds that R' 1 C R l2 . 
A direct mapping M is considered monotone if for any such pair of 
instances, the result of mapping I2 contains the result of mapping 
Ji. In other words, if we insert new data to the database, then the 
elements of the mapping that are already computed are unaltered. 

Definition 4 (Monotonicity) A direct mapping M is monotone if 
for every relational schema R, set E of PKs and FKs over R, 
and instances 1%, I2 of R such that I\ C I 2 : M(R, E,ii) C 
M(R,E,J 2 ). 

Semantics preservation: A direct mapping is semantics preserv- 
ing if the satisfaction of a set of PKs and FKs by a relational database 
is encoded in the translation process. More precisely, given a rela- 
tional schema R, a set E of PKs and FKs over R and an instance 
I of R, a semantics preserving mapping should generate from I a 
consistent RDF graph if I \= E, and it should generate an incon- 
sistent RDF graph otherwise. 

Definition 5 (Semantics preservation) A direct mapping M is se- 
mantics preserving if for every relation schema R, set E of PKs 
and FKs over R and instance I ofH: I \= E iff M(R., E, /) is 
consistent under OWL semantics. 

4. THE DIRECT MAPPING vm 

We introduce a direct mapping DM, that integrates and extends 
the functionalities of the direct mappings proposed in [22. 5). DM 
is defined as a set of Datalog rulesjj, which are divided in two parts: 
translate relational schemas and translate relational instances. 

In Section PTTl we present the predicates that are used to store a 
relational database, the input of DM. In Section [4721 we present 
predicates that are used to store an ontology and Datalog rules to 
generate an ontology from the relational schema and the set of PKs 
and FKs. In Section |4~3"1 we present the Datalog rules that generate 
the OWL vocabulary from the ontology that was derived from the 
relational schema and a set of PKs and FKs. Finally, we present 
in Section |4~4"l the Datalog rules that generates RDF triples from a 
relational instance. 

Throughout this section, we use the following running example. 
Consider a relational database for a university. The schema 
of this database consists of tables STUDENT (SID, NAME) , 
COURSE (CID, TITLE, CODE) , DEPT (DID, NAME) and 
ENROLLED (SID, CID) . Moreover, we have the following 
constraints about the schema of the university: SID is the primary 
key of STUDENT, CID is the primary key of COURSE, DID is 
the primary key of DEPT, (SID, CID) is the primary key of 
ENROLLED, CODE is a foreign key in COURSE that references 
attribute DID in DEPT, SID is a foreign key in ENROLLED that 
references attribute SID in STUDENT, and CID is a foreign key in 
ENROLLED that references attribute CID in COURSE. 

4.1 Storing relational databases 

Given that the direct mapping DM is specified by a set of Datalog 
rules, its input (R, E, I) has to be encoded as a set of relations. 
We define the predicates that are used to store the triples of the 
form (R, E, I). More precisely, the following predicates are used 
to store a relational schema R and a set E of PKs and FKs over R. 

• REL(r): Indicates that r is a relation name in R; e.g. 
Rel( "STUDENT " ) indicates that STUDENT is a relation nameQ 

1 We refer the reader to (3) for the syntax and semantics of Datalog. 
2 As is customary, we use double quotes to delimit strings. 



• ATTR(a,r): Indicates that a is an attribute in the relation r in 
R; e.g. Attr("NAME", "STUDENT") holds. 

• PK,i(ai, . . . , On, r): Indicates that r[ai, . . . , a n ] is a primary 
key in E; e.g. PKi("SiD", "student") holds. 

• FK n (oi, . . . , a„, r, 61, . . . , b n , s); Indicates that 
r[ai,...,a n ] Qfk s[bi,...,b n ] is a foreign key in E; 
e.g. FKi("CODE", "COURSE", "DID", "DEPT") holds. 

Moreover, the following predicate is used to store the tuples in an 
relational instance / of a relational schema R. 

• Value(u, a, t, r): Indicates that v is the value of an at- 
tribute a in a tuple with identifier f in a relation r (that be- 
longs to R); e.g. a tuple ti of table STUDENT such that 
fi.SID = "1" and t\ .NAME = NULL is stored by us- 
ing the facts Value("1", "SID", "idl", "STUDENT") and 
VALUE(NULL, "NAME", "idl", " STUDENT "), assuming that 
idl is the identifier of tuple t\. 

4.2 Storing an ontology 

In order to translate a relational database into an RDF graph with 
OWL vocabulary, we first extract an ontology from the relational 
schema and the set of PKs and FKs given as input. In particular, we 
classify each relation name in the schema as a class or a binary re- 
lation (which is used to represent a many-to-many relationship be- 
tween entities in an ER/UML diagram), we represent foreign keys 
as object properties and attributes of relations as data type proper- 
ties. More specifically, the following predicates are used to store 
the extracted ontology: 

• CLASS (c): Indicates that c is a class. 

• OP„ (pi , . . . , p n , d, r ) : Indicates that pi, . . . ,p n (n > 1 ) form 
an object property with domain d and range r. 

• DTP(p, d): Indicates that p is a data type property with domain 
d. 

The above predicates are defined by the Datalog rules described in 
the following sections. 

Identifying binary relations: We define auxiliary predicates that 
identify binary relations to facilitate identifying classes, object prop- 
erties and data type properties. Informally, a relation J? is a binary 
relation between two relations S and T if (1) both S and T are 
different from R, (2) R has exactly two attributes A and B, which 
form a primary key of R, (3) A is the attribute of a foreign key in 
R that points to S, (4) B is the attribute of a foreign key in R that 
points to T, (5) A is not the attribute of two distinct foreign keys in 
R, (6) B is not the attribute of two distinct foreign keys in R, (7) 
A and B are not the attributes of a composite foreign key in R, and 
(8) relation R does not have incoming foreign keys. In Datalog this 
becomes: 

BinRel(R, A, B, S, C, T, D) «- 

PK 2 (A, B, R), ^ThreeAttr(R), 

FKi(A, R,C,S),Rjt S, FKi(B, R, D, T), R + T, 

-.TW0FK( J 4,i?,),-.TW0FK(B,if), (1) 

-.OneFK(A, B, R), -.FKto(-R). 

In a Datalog rule, negation is represented with the symbol -1 and 
upper case letters are used to denote variables. Thus, the previous 
rule states that the relation R is a binary relation between two rela- 
tions S and T if the following conditions are satisfied, (a) Expres- 
sion PK2(A, B, R) in Q} indicates that attributes A and B form a 
primary key of R. (b) Predicate THREE ATTR checks whether a re- 
lation has at least three attributes, and it is defined as follows from 
the base predicate ATTR: 

ThreeAttr(R) <— Attr(X, R), Attr(Y, R), 

Attr(Z, R), X Y, X ^ Z, Y ^ Z. 



Thus, expression ^ThreeAttr(_R) in Q3 indicates that R has at 
least two attributes. Notice that by combining this expression with 
PK 2 (A, B,R),we conclude that A, B are exactly the attributes of 
R. (c) Expressions FKi(A, R, C, S) and FKi(B, R, D, T) in 0} 
indicate that A is the attribute of a foreign key in R that points to 
S and B is the attribute of a foreign key in R that points to T, re- 
spectively, (d) Expressions R 7^ S and R 7^ T in (0 indicate that 
both S and T are different from relation R. (e) Predicate TwoFK 
checks whether an attribute of a relation is the attribute of two dis- 
tinct foreign keys in that relation, and it is defined as follows from 
the base predicate FKi : 

TwoFK(X,Y) «- FK 1 (X,Y,Ui,V 1 ),FK 1 (X,Y,U 2 , V 2 ), 
Ui £ U 2 

TwoFK(X,Y) 4- FK 1 (X,Y,U 1 ,V 1 ),FKi(X,Y,U 2 ,V 2 ), 

V!^V 2 

Thus, expressions ^TwoFK( J 4, R) and ^TwoFK(B, R) in Q 
indicate that attribute A is not the attribute of two distinct foreign 
keys in R and B is not the attribute of two distinct foreign keys 
in R, respectively, (f) Predicate OneFK checks whether a pair of 
attributes of a relation are the attributes of a composite foreign key 
in that relation: 

OneFK( X , Y, Z) <r- FK 2 (X, Y, Z, U,V,W) 
OneFK( X , Y, Z) <- FK 2 (Y, X, Z, U,V,W) 

Thus, expression ^OneFK(j4, B, R) in (T) indicates that attributes 
A, B of R are not the attributes of a composite foreign key in R. 
(g) Finally, predicate FKTO checks whether a relation with two 
attributes has incoming foreign keys: 

FKto(X) <r- FKi(C/i,y, V,X) 
FKto(X) FK 2 (U!,U 2 ,Y,Vi, V 2 ,X) 

Thus, expression ^FKTO(i?) in l[T) indicates that relation R does 
not have incoming foreign keys. 

For instance, BinRel("ENROLLED", "SID", "CID", 
"STUDENT", "SID", "COURSE", "CID") holds in our exam- 
ple. Note that there is no condition in the rule (T) that requires 
S and T to be different, allowing binary relations that have their 
domain equal to their range. Also note that, for simplicity, we 
assume in the rule 0} that a binary relation R consists of only two 
attributes A and B. However, this rule can be easily extended to 
deal with binary relations generated from many-to-many relation- 
ships between entities in an ER/UML diagram that have more than 
two attributes. 

Identifying classes: In our context, a class is any relation that is 
not a binary relation. That is, predicate CLASS is defined by the 
following Datalog rules: 

Class(JC) <- Rel(X), -.IsBinRel(X) 
IsBlNRELpO 4- BinRel(J£, A, B, S, C, T, D) 

In our example, Class("DEPT"), Class("STUDENT") and 
CLASS("COURSE") hold. 

Identifying object properties: For every n > 1, the following 
rule is used for identifying object properties that are generated from 
foreign keys: 

OP 2n (X 1 ,...,X n ,Y 1 ,...,Y n ,S,T) <- 

FK n (J>fi, . . . , X n> S, Y U . . . , Y n , T), ^IsBinRel(S) 

3 Notice that although we consider an infinite number of rules in 
the definition of VA4, for every concrete relational database we 
will need only a finite number of these rules. 



This rule states that a foreign key represents an object property 
from the entity containing the foreign key (domain) to the refer- 
enced entity (range). It should be noticed that this rule excludes 
the case of binary relations, as there is a special rule for this type 
of relations (see rule Q}). In our example, OP2("CODE", "DID", 
" COURSE " , " DEPT " ) holds as CODE is a foreign key in the table 
COURSE that references attribute DID in the table DEPT. 

Identifying data type properties: Every attribute in a non-binary 
relation is mapped to a data type property: 

DTP(A, R) <- Attr(A, R),^IsBinRel(R) 

For instance, we have that DTP(" NAME ", "STUDENT") holds in 
our example, while DTP( "SID"," ENROLLED " ) does not hold as 
ENROLLED is a binary relation. 

4.3 Translating a relational schema into OWL 

We now define the rules that translates a relational database schema 
into an OWL vocabulary. 

4.3.1 Generating IRIs for classes, object properties 

and data type properties 
We introduce a family of rules that produce IRIs for classes, bi- 
nary relations, object properties and data type properties identi- 
fied by the mapping (which are stored in the predicates CLASS, 
BinRel, OP„ and DTP, respectively). Note that the IRIs 
generated can be later on replaced or mapped to existing IRIs 
available in the Semantic Web. Assume given a base IRI 
base for the relational database to be translated (for example, 
"http : / / example . edu/db/ "), and assume given a family of 
built-in predicates CONCAT n (n > 2) such that CONCAT„ has n+1 
arguments and CONCAT n (a;i, . . . ,x n ,y) holds if y is the concate- 
nation of the strings x\, . . ., x n . Then by following the approach 
proposed in (5), DM uses the following Datalog rules to produce 
IRIs for classes and data type properties: 

CLASSlRI(i?, X) «- CLASS(i?),CONCAT 2 (base,,R,J s 
DTP_IRI(A, R, X) «- DTP(j4, R), CONCAT4(base, R, "#", A, X) 

For instance, |http : //example ■ edu/db/STUDENT| is 
the IRI for the STUDENT relation in our example, and 
|http : //example ■ edu/db/ STUDENT #NAME| is the IRI 
for attribute NAME in the STUDENT relation (recall that 
DTP(" NAME ", "STUDENT") holds in our example). More- 
over, T>M uses the following family of Datalog rules to generate 
IRIs for object properties. First, for object properties generated 
from binary relations, the following rules is used: 

OP^IRIi (R, A, B, S, C, T, D, X) <— 

BinRel(R, A, B, S, C, T, D), 

CONCATio(base,/?, "#", A, " , ",B, ", ",C, ", ",D,X) 

Thus, [http : / /example ■ edu/ db/ENRQLLED#SID^ CID, SID, CID 
is the IRI for binary relation ENROLLED in our example. Second, 
for object properties generated from a foreign key consisting of n 
attributes (n > 1), the following rule is used: 

OP_IRI 2 n(*i, ■ ..,X n ,Yi,...,Y n ,S,T,X)<- 
OP 2n (X u ... ,X n ,Y u ... ,Y n , S,T), 

CONCAT4„ +4 (base, S, ", ", T, "#", X\ , X„_i, " , ", 

Xn,",",ii,",",...,y„_i,",",y„,x) 

Thus, given that OP 2 ("CODE", "DID", 

"COURSE", "DEPT") holds in our example, IRI 
|http: //example ■ edu/db/COURSE^ DEPT#CODE, DID is 
generated to represented the fact that CODE is a foreign key in the 
table COURSE that references attribute DID in the table DEPT. 



4.3.2 Translating relational schemas 

The following Datalog rules are used to generate the RDF repre- 
sentation of the OWL vocabulary. First, a rule is used to collect all 
the classes: 

TRIPLE([/, "rdf :type", "owl : Class") <- 

CLASS(-R), CLASSIRI(R, U) 

Predicate TRIPLE is used to collect all the triples of the RDF graph 
generated by the direct mapping DM. Second, the following fam- 
ily of rules is used to collect all the object properties (n > 1): 

Triple((/, "rdf : type", "owl : Ob jectProperty ") 

OP„ (Xi,...,X n ,S,T), OP_IRI n (Xi , . . . , X n , S, T, U) 

Third, the following rule is used to collect the domains of the object 
properties (n > 1): 

TRIPLE(C7, "rdfs:domain", W) i- OP n (-Xi, . . . ,X„, S, T), 
OP_IRI n (X 1 ,...,X n ,S,T,U), CLASSIRI(S, W) 

Fourth, the following rule is used to collect the ranges of the object 
properties (n > 1): 

TRIPLE(C7, "rdfs: range", W) i- OP n (X i, . . . , X„, S, T), 

OP_IRI n (Xi, . ..,X n ,S,T,U), ClassIRI(T, W) 

Fifth, the following rule is used to collect all the data type proper- 
ties: 

Triple([/, "rdf :type", " owl : DatatypeProperty ") <- 

DTP(A, R), DTP_IRI(A, R, U) 

Finally, the following rule is used to collect the domains of the data 
type properties: 

TRIPLE(C/, "rdf s :domain", W) <— 

DTP(A, R), DTP_IRI(A, R, U), CLASSlRI(il, W) 

4.4 Translating a database instance into RDF 

We now define the rules that map a relational database instance 
into RDF. More specifically, we first introduce a series of rules for 
generating IRIs, and then we present the Datalog rules that generate 
RDF. 

4.4. 1 Generating IRIs for tuples 

We introduce a family of predicates that produce IRIs for 
the tuples being translated, where we assume a given a 
base IRI base for the relational database (for example, 
"http : / / example . eclu/db/ ")• First, DM uses the follow- 
ing Datalog rule to produce IRIs for the tuples of the relations hav- 
ing a primary key: 
RowiRi n (Vi, V 2 , ■ ■ ■ , V n , A u A 2 , . . . , A n ,T, R, X) i~ 

PK n (A 1 ,A 2 ,...,A n ,R), Value( Vi ,Ai,T,R), 
Value(V2, A 2 ,T, R), . . . , Value(V„,A„,T, R), 
CONCAT 4n+2 (base, R, "#", Ai, "=", Vi, " , ", 
A 2 , " = ",V 2 , ",",...,", ",A n , "=", V n ,X) 

Thus, given that the facts PKi("SID", "STUDENT") and 
Value("1", "SID", "idl", "STUDENT") hold in our example, 
the IRI |http : / /example . edu/db/STUDENT#SID=ll is the 
identifier for the tuple in table STUDENT with value 1 in the pri- 
mary key. Moreover, DM uses the following rule to generate blank 
nodes for the tuples of the relations not having a primary key: 

BlankNode(T, R, X) <- 

Value(V, A, T, R.), Concat 3 ("_: », R, T, X) 



4.4.2 Translating relational instances 
The direct mapping DM generates three types of triples when 
translating a relational instance: Table triples, reference triples and 
literal triples [5]. Following are the Datalog rules for each one of 
these cases. 

For table triples, DM produces for each tuple t in a relation R, a 
triple indicating that t is of type r. To construct these tuples, DM 
uses the following auxiliary rules: 

TupleID(T, R,X) i- 

Class(R), PK„(Ai, . ..,A n ,R), 

VALUE("Vi ,Ai,T,R),..., Value(V« , A n , T, R) , 

RowIRI n (Vi,. . .,V n ,A u . . .,A n ,T,R,X) 

TupleID(T, R,X) <- 

CLASS(i?), ^HASPK„(iJ), 

Value(V, A, T, R), BlankNode(T, R, X) 

That is, TUPLEID(T, R, X) generates the identifier X of a tuple 
T of a relation R, which is an IRI if R has a primary key or a 
blank node otherwise. Notice that in the preceding rules, predicate 
HasPK„ is used to check whether a table R with n attributes has 
a primary key (thus, ^HASPK n (i?) indicates that R does not have 
a primary key). Predicate HasPK„ is defined by the following n 
rules: 

HasPK„(A') <- PK i (A 1 ,...,A i ,X) ie{l,...,n} 
The following rule generates the table triples: 

Triple(C7, "rdf : type", W) <- 

Value(V, A, T, R), TupleID(T, R, U), ClassIRI(R, W) 

For example, the following is a table triple in our example: 

TRIPLE("http: / /example . edu/db /STUDENT* SID=1", 
"rdf : type", 

"http : //example . edu/db/ STUDENT") 

For reference triples, DM generates triples that store the references 
generated by binary relations and foreign keys. More precisely, 
the following Datalog rule is used to construct reference triples for 
object properties that are generated from binary relations: 

Triple(C7, V, W) <- BinRel(A, A, B, S, C, T, D), 

Value( Vi , A, Ti,R), Value( Vi ,C,T 2 ,S), 
Value(V 2 , B, Ti , R) , Value( V 2 , D, T z , T) , 

TUPLElD(T2, S, U), 

OP_IRl! (R, A, B, S, C, T, D, V), 

TupleID(T 3 ,T, W) 

Moreover, the following Datalog rule is used to construct reference 
triples for object properties that are generated from foreign keys 
in > 1): 

Triple(C7, V,W) <- 

OP 2n (A u . . . , A n , Bi, . . . , B n , S, T), 
VAhVE(Vi,Ai,Ti,S), Value(V„, A n , Ti, S), 
Value( Vi , Bi , T 2 , T) , . . . , Valuer , B n , T 2 , T) , 
TupleID(Ti , S, U) , TupleID(T 2 ,T,W), 
OP_IRI 2n ( Ai , . . . , A n , B 1 , . . . , B n , S, T, V ) 

Finally, DM produces for every tuple t in a relation R and for 
every attribute A of R, a triple storing the value of t in A, which is 
called a literal triple. The following Datalog rule is used to generate 
such triples: 



Triple(C7, V, W) <- DTP(A, R), Value(VK, A, T, R), 

W ± NULL, TupleID(T, R, U), DTP_IRI(A, R, V) 

Notice that in the above rule, we use the condition W 7^ NULL to 
check that the value of the attribute A in a tuple T in a relation R is 
not null. Thus, literal triples are generated only for non-null values. 
The following is an example of a literal triple: 

TRIPLE("http: / /example . edu/db/STUDENT#SID = l ", 

"http: / /example . edu/db/STUDENT#NAME", "John") 

5. PROPERTIES OF VM 

We now study our direct mapping VM with respect to the two fun- 
damental properties (information preservation and query preserva- 
tion) and the two desirable properties (monotonicity and semantics 
preservation) defined in Section[3] 

5.1 Information preservation of vm 

First, we show that VM does not lose any piece of information in 
the relational instance being translated: 

Theorem 1 The direct mapping VM is information preserving. 

The proof of this theorem is straightforward, and it involves provid- 
ing a computable mapping TV : Q — > T that satisfies the condition 
in Definition [2] that is, a computable mapping M that can recon- 
struct the initial relational instance from the generated RDF graph. 

5.2 Query preservation of vm 

Second, we show that the way VM maps relational data into RDF 
allows one to answer a query over a relational instance by trans- 
lating it into an equivalent query over the generated RDF graph. 

Theorem 2 The direct mapping VM is query preserving. 

In (4), it was proved that SPARQL has the same expressive power 
as relational algebra. Thus, one may be tempted to think that this 
result could be used to prove Theorem [2] However, the version of 
relational algebra considered in ]4) does not include the null value 
NULL, and hence cannot be used to prove our result. In addition to 
this, other researchers have addressed the issue of querying answer- 
ing on DL ontologies with relational databases 1201 . Our work is 
similar in the sense that we address the issue of query preservation 
between a database and an ontology. However, the main difference 
is that rather than a domain ontology, the ontology we use is syn- 
thesized in a standard way from the database schema. Therefore, 
their results cannot be directly applied to our setting. 

We present an outline of the proof of this theorem, and refer the 
reader to the Appendix for the details. Assume given a relational 
schema R and a set S of PKs and FKs over R. Then we have to 
show that for every relational algebra query Q over R, there exists 
a SPARQL query Q* such that for every instance I of R (possibly 
including null values) satisfying S: 



"ilQh) = [01 



■DM(n,s,i)- 



(2) 



Interestingly, the proof that the previous condition holds is by in- 
duction on the structure of Q, and thus it gives us a bottom-up 
algorithm for translating Q into an equivalent SPARQL query Q*, 
that is, a query Q* satisfying condition J2j, In what follows, we 
consider the database used as example in Section [4] and the re- 
lational algebra query <7Name=Juan (STUDENT) CXI ENROLLED, 
which we will use as a running example and translate it step by 
step to SPARQL, showing how the translation algorithm works. 



For the sake of readability, we introduce a function v that re- 
trieves the IRI for a given relation R, denoted by v(R), and the IRI 
for a given attribute A in a relation R, denoted by u(A, R). The 
inductive proof starts by considering the two base relational alge- 
bra queries: the identity query R, where R is a relation name in 
the relational schema R, and the query NULL a- These two base 
queries give rise to the following three base cases for the inductive 
proof. 

Non-binary relations: Assume that Q is the identity rela- 
tional algebra query R, where 7? 6 R is a non-binary relation 
(that is, IsBinRel(.R) does not hold). Moreover, assume that 
att(R) = {Ai, . . . , At}, with the corresponding IRIs v(R) = 
r, v(Ai,R) = ai, . . . , v{A t , R) = a t . Then a SPARQL query 
Q* satisfying ((2) is constructed as follows: 



SELECT {?Ai , . . . , ?Ai} 



(IX, "rdf :type",r) 



OPT (?X,oi,?Ai) OPT (?X,a 2 ,?A 



OPT (7X,a 3 ,?A 3 



OPT (IX, at,? A e 



Notice that in order to not lose information, the operator OPT 
is used (instead of AND) because the direct mapping VM does 
not translate NULL values. In our example, the relation name 
STUDENT is a non-binary relation. Therefore the following equiv- 
alent SPARQL query is generated with input STUDENT: 



SELECT {?SID, ?NAME} 



{IX, "rdf :type", : STUDENT) 



OPT {IX, : STUDENT#SID, ?SID) 



OPT : STUDENT #NAME, ?NAME) 



It should be noticed that in the previous query, the symbol : has to 
be replaced by the base IRI used when generating IRIs for relations 
and attributes in a relation (see Section l4.3.u Fl 

Binary relations: Assume that Q is the identity relational algebra 
query R, where R £ R is a binary relation (that is, IsBlNREL(i?) 
holds). Moreover, assume that att(R) — {Ai, A2}, where A\ is 
a foreign key referencing the attribute B of a relation S, and A2 is 
a foreign key referencing the attribute C of a relation T. Finally, 
assume that v(R) = r, v(B,S) = b and u(C,T) — c, Then a 
SPARQL query Q* satisfying (O is defined as follows: 

SELECT {?Ai, ?A 2 } ((?T u r,?T 2 ) AND 

(?Ti, b, ?A%) AND (?T 2 , c, ?A 2 )). 

Given that a binary relation is mapped to an object property, the val- 
ues of a binary relation can be retrieved by querying the datatype 
properties of the referenced attributes. In our example, the rela- 
tional name ENROLLED is a binary relation. Therefore the follow- 
ing equivalent SPARQL query is generated with input ENROLLED: 

SELECT {?SID, ?CID}( 



cm., 



ENROLLED # SID, CID, SID, CID, ?T 2 ) AND 
STUDENT#SID, ?SID) AND 
COURSE#CID, ?CID)). 



4 In SPARQL terminology, we have includ ed the follo wing prefix 
in the query: B prefix : <|http : / /example . edu/db/]*, if 
the base IRI is < |http : 77 example . edu/db/p! 



Empty relation: Assume that Q = NULLa, and define Q* as 
the empty graph pattern { }. Then we have that condition (O holds 
because of the definition of the function tr, which does not translate 
NULL values to mappings. 

We now present the inductive step in the proof of Theorem [2] 
Assume that the theorem holds for relational algebra queries Q\ 
and Qi. That is, there exists SPARQL queries Q\ and Q 2 such 
that: 



HlQiji) = [<?iJdm(R,s,j), 
tr{\Q4i) = IQtl-DMi^.iy 



(3) 
(4) 



The proof continues by presenting equivalent SPARQL queries for 
the following relational algebra operators: selection (a), projection 
(7r), rename (5), join (txi), union (U) and difference (\). It is im- 
portant to notice that the operators left-outer join, right-outer join 
and full-outer join are all expressible with the previous operators, 
hence we do not present cases for these operators. 

Selection: We need to consider four cases to define query Q* sat- 
isfying condition {2j. In all these cases, we use the already estab- 
lished equivalence ((3). 

1. If Q is CTA 1 =a(Qi), then 

Q* = (Ql FILTER (?A± = a)). 

2. If Q is a Al7 ta(Qi), then 

Q* = (Qt FILTER {->(?A 1 — a) A bound(?Ai))). 

3. IfQ (Qi), then 

Q* = (Qi FILTER (-.bound(?Ai))). 



)))■ 



4. If Q IS <3"lsNotNull(A-i ) 

(Qi),then 
Q* = (Qi FILTER (bound(?Ai 

These equivalences are straightforward. However, it is important 
to note the use of bound(-) in the second case; as the semantics 
of relational algebra states that if Q is the query a"A 17 ^a(Qi), then 
IQJi = {t £ IQi}i I t.Ai / NULL and t.Ai / a}, we have 
that the variable ?A\ has to be bound because the values in the 
attribute Ai in the answer to crA 1 ^ a (Qi) are different from NULL. 
Following our example, we have that the following SPARQL query 
is generated with input a N ame=juan( STUDENT): 



(IX, "rdf :type", : STUDENT) 



^SELECT {?SID, ?NAME} 



OPT (IX, : STUDENT#SID, ?SID) 

OPT (IX, : STUDENT #NAME, ?NAME) 

FILTER (?NAME = Juan) 

Projection: Assume that Q — 7r {A 1 ,...,Af}(Qi)- 
Then query Q* satisfying condition lO is defined as 
(SELECT {?A ly . . . ,7A e } Q*). It is important to notice 
that we use nested SELECT queries to deal with projection, as 
well as in two of the base cases, which is a functionality specific to 
SPARQL 1.1 fT21, 

Rename: Assume that Q = 5a 1 ->s 1 (Qi) and att(Q) — {Ai, 
. . ., At}. Then query Q* satisfying condition l[2j is defined as 
(SELECT {?Ai AS ?Bi,?A 2 , ?A e } QI). Notice that this 
equivalence holds because the rename operator in relational algebra 
renames one attribute to another and projects all attributes of Q. 



Join: Assume that Q = (Qi txi Q2), where (att(Qi)C\att(Q2)) = 
{A\, . . . , Ae}. Then query Q* satisfying condition $2% is defined 
as follows: 



QI FILTER (bound(?Ai) A ■ ■ ■ A bound(?ylf)) AND 



Ql FILTER (bound(?4i) A • • • A bound(?A £ )) 



Note the use of bound(-) which is necessary in the SPARQL 
query in order to guarantee that the variables that are being joined 
on are not null. Following our example, Figure Q] shows the 
SPARQL query generated with input cr Name =juan(STUDENT) IX 
ENROLLED. 

Union: Assume that Q = (Q\ U Q2). Then query Q* satisfying 
condition l|2]l is simply defined as (QI UNION Q^)- Notice that 
in this case we are using the already established equivalences l[3} 
and ©. 

Difference: We conclude our proof by assuming that Q = (Qi \ 
Q 2 )- In this case, it is also possible to define a SPARQL query Q* 
satisfying condition J2j, We refer the reader to the appendix for the 
complete description of Q* . 

5.3 Monotonicity and semantics preservation 

Of VM 

Finally, we consider the two desirable properties identified in Sec- 
tion [3]2] First, it is straightforward to see that DM. is monotone, 
because all the negative atoms in the Datalog rules defining T>M. 
refer to the schema, the PKs and the FKs of the database, and 
these elements are kept fixed when checking monotonicity. Unfor- 
tunately, the situation is completely different for the case of seman- 
tics preservation, as the following example shows that the direct 
mapping VM does not satisfy this property. 

Example 2 Assume that a relational schema contains a relation 
with name STUDENT and attributes SID, NAME, and assume that 
the attribute S ID is the primary key. Moreover, assume that this re- 
lation has two tuples, ti and t 2 such that ti.SID = 1, ti.NAME = 
lohn and t2-SID = 1,*2.NAME = Peter. It is clear that the primary 
key is violated, therefore the database is inconsistent. However, it 
is not difficult to see that after applying VM, the resulting RDF 
graph is consistent. □ 

In fact, the result in Example|2]can be generalized as it is possible 
to show that the direct mapping VM always generates a consistent 
RDF graph, hence, it cannot be semantics preserving. 

Proposition 1 The direct mapping VM is not semantics preserv- 
ing. 

Does this mean that our direct mapping is incorrect? What could 
we do to create a direct mapping that is semantics preserving? 
These problems are studied in depth in the following section. 

6. SEMANTICS PRESERVATION OF 
DIRECT MAPPINGS 

We now study the problem of generating a semantics-preserving 
direct mapping. Specifically, we show in Section loTTI that a simple 
extension of the direct mapping VM can deal with primary keys. 
Then we show in Section [6\2l that dealing with foreign keys is more 
difficult, as any direct mapping that satisfies the condition of being 
monotone cannot be semantics preserving. Finally, we present two 
possible ways of overcoming this limitation. 



I SELECT {?SID, ?NAME} ( (IX, "rdf: type", : STUDENT) OPT (IX, : STUDENT#SID, ?SID) ) OPT 



FILTER (?NAME = Juan) FILTER (bound(?SID)) 



(IX, : STUDENT #NAME, ?NAME) 
AND 

( SELECT {?SID, ?CID}( (?Ti, :ENROLLED#SID, CID, SID, CID, ?T 2 ) AND (?Ti, : STUDENT#SID, ?SID) AND 



(?T 2 , :COURSE#CID, ?CID)JJ FILTER (bound(?SID)) 
Figure 1: SPARQL translation of the relational algebra query <T Name=Juan ( STUDENT) ixi ENROLLED. 



6.1 A semantics preserving direct mapping for 
primary keys 

Recall that a primary key can be violated if there are repeated values 
or null values. At a first glance, one would assume that owhhasKey 
could be used to create a semantics preserving direct mapping for 
primary keys. If we consider a database without null values, a vi- 
olation of the primary key would generate an inconsistency with 
owhhasKey and the unique name assumption (UNA). However, if 
we consider a database with null values, then owhhasKey with the 
UNA does not generate an inconsistency because it is trivially satis- 
fied for a class expression that does not have a value for the datatype 
expression. Therefore, we must consider a different approach. 

Consider a new direct mapping VA4 p k that extends T>M as fol- 
lows. A Datalog rule is used to determine if the value of a primary 
key attribute is repeated, and a family of Datalog rules are used to 
determine if there is a value NULL in a column corresponding to 
a primary key. If some of these violations are found, then an arti- 
ficial triple is generated that would produce an inconsistency. For 
example, the following rules are used to map a primary key with 
two attributes: 

TRIPLE(<2, "owl : dif f erentFrom", a) *r- PK 2 (X 1 , X 2 , R), 
Value(Vi ,Xx,T!,R), Value(Vi , X x , T 2 ,R), 
Value(V 2 , X 2 , Ti, R), Valve(V 2 ,X 2 ,T 2 ,R), Ti ^ T 2 

TRIPLE(a, "owl : dif f erentFrom", a) <r- PK2(Xl , X 2 , R), 
VALUE<y,Xi,T, R),V = NULL 

TRIPLE(a, "owl : dif f erentFrom", a) PK 2 (Xi, X 2 , R), 
VALUE(V, X 2 , T, R), V = NULL 

In the previous rules, a is any valid IRI. If we apply T>M p k to the 
database of Example[2] it is straightforward to see that starting from 
an inconsistent relational database, one obtains an RDF graph that 
is also inconsistent. In fact, we have that: 

Proposition 2 The direct mapping T>M p k is information preserv- 
ing, query preserving, monotone, and semantics preserving if one 
considers only PKs. That is, for every relational schema R, set 
£ of (only) PKs over R, and instance I of Ft: I \= T, iff 
~PM P k{R-, E, 7) is consistent under OWL semantics. 

Information preservation, query preservation and monotonicity of 
T>M p k are corollaries of the fact that these properties hold for DM, 
and of the fact that the Datalog rules introduced to handle primary 
keys are monotone. 

A natural question at this point is whether VM p k can also deal 
with foreign keys. Unfortunately, it is easy to construct an example 
that shows that this is not the case. Does this mean that we cannot 
have a direct mapping that is semantics preserving and considers 
foreign keys? We show in the following section that monotonicity 
has been one of the obstacles to obtain such a mapping. 



6.2 Semantics preserving direct mappings for 
primary keys and foreign keys 

The following theorem shows that the desirable condition of be- 
ing monotone is, unfortunately, an obstacle to obtain a semantics 
preserving direct mapping. 

Theorem 3 No monotone direct mapping is semantics presen'ing. 

It is important to understand the reasons why we have not been 
able to create a semantics preserving direct mapping. The issue 
is with two characteristics of OWL: (1) it adopts the Open World 
Assumption (OWA), where a statement cannot be inferred to be 
false on the basis of failing to prove it, and (2) it does not adopt the 
Unique Name Assumption (UNA), where two different names can 
identify the same thing. On the other hand, a relational database 
adopts the Closed World Assumption (CWA), where a statement is 
inferred to be false if it is not known to be true. In other words, 
what causes an inconsistency in a relational database, can cause an 
inference of new knowledge in OWL. 

In order to preserve the semantics of the relational database, we 
need to ensure that whatever causes an inconsistency in a rela- 
tional database, is going to cause an inconsistency in OWL. Fol- 
lowing this idea, we now present a non-monotone direct mapping, 
T>M p k+jk, which extends T>M p k by introducing rules for verify- 
ing beforehand if there is a violation of a foreign key constraint. If 
such a violation exists, then an artificial RDF triple is created which 
will generate an inconsistency with respect to the OWL semantics. 
More precisely, the following family of Datalog rules are used in 
T>M p k+jk to detect an inconsistency in a relational database: 

Violation(S) <- 

VY. n (X 1 ,...,X n ,S,Y 1 ,...,Y n ,T), 

Value„ ( Vi ,X\,T,S), . . . , Value( V n ,X n ,T,S), 

Vi ^ NULL, . . . , V n ^ NULL, 

- >ISVALUE n (Vl , . . . , V n , Yl , . . . , Y n , T) 

In the preceding rule, the predicate IsVALUE„ is used to check 
whether a tuple in a relation has values for some given attributes. 
The predicate IsVALUE n is defined by the following rule: 

ISVALUE„ (V 1 ,...,Vn,B 1 ,...,B n ,S) <- 

Value( Vi , B 1 , T, S) , . . . , Value( V„ , B n , T, S) 

Finally, the following Datalog rule is used to obtain an inconsis- 
tency in the generated RDF graph: 



TRIPLE(a, "owl : dif f erentFrom", a) 



VlOLATION(S') 



In the previous rule, a is any valid IRI. It should be noticed that 
T>M p k+fk is non-monotone because if new data in the database is 
added which now satisfies the FK constraint, then the artificial RDF 
triple needs to be retracted. 



Theorem 4 The direct mapping T>Mpk+jk is information presen'- 
ing, query presen'ing and semantics presen'ing. 

Information preservation and query preservation of DMpk+jk are 
corollaries of the fact that these properties hold for T>M and T>M pk . 

A direct mapping that satisfies the four properties can be ob- 
tained by considering an alternative semantics of OWL that ex- 
presses integrity constraints. Because OWL is based on Description 
Logic, we would need a version of DL that supports integrity con- 
straints, which is not a new idea. Integrity constraints are epistemic 
in nature and are about "what the knowledge base knows" 1181 . 
Extending DL with the epistemic operator K has been studied (7] 
[9] |10| . Grimm et al. proposed to extend the semantics of OWL 
to support the epistemic operator 1111 . Motik et al. proposed to 
write integrity constraints as standard OWL axioms but interpreted 
with different semantics for data validation purposes 1151 . Tao et 
al. showed that integrity constraint validation can be reduced to 
SPARQL query answering 121 1 . Recently, Mehdi et al. introduced 
a way to answer epistemic queries to restricted OWL ontologies 
1141 . Thus, it is possible to extend T>M P k to create an information 
preserving, query preserving and monotone direct mapping that is 
also semantics preserving, but it is based on a non-standard version 
of OWL including the epistemic operator K. 

7. CONCLUDING REMARKS 

In this paper, we study how to directly map relational databases to 
an RDF graph with OWL vocabulary based on two fundamental 
properties (information preservation and query preservation) and 
two desirable properties (monotonicity and semantics preservation). 
We first present a monotone, information preserving and query pre- 
serving direct mapping considering databases that have null values. 
Then we prove that the combination of monotonicity with the OWL 
semantics is an obstacle to generating a semantics preserving direct 
mapping. Finally, we overcome this obstacle by presenting a non- 
monotone direct mapping that is semantics preserving, and also by 
discussing the possibility of generating a monotone mapping that 
assumes an extension of OWL with the epistemic operator. 

Related Work: Several approaches directly map relational schemas 
to RDFS and OWL. We refer the reader to the following survey 
1191 . D2R Server has an option that directly maps the relational 
database into RDF, however this process is not documented J2). 
RDBToOnto presents a direct mapping that mines the content of 
the relational databases in order to learn ontologies with deeper 
taxonomies (8j. Currently, the W3C RDB2RDF Working Group 
is developing a direct mapping standard that focuses on translating 
relational database instances to RDF [5 6|. 

Future Work: We would like to extend our direct mapping to con- 
sider datatypes, relational databases under bag semantics and eval- 
uate this rule based approach on large relational databases. The ex- 
tension of our direct mapping to bag semantics is straightforward. 
In our setting each tuple has its own identifier, which is represented 
in the VALUE predicate. Thus, even if repeated tuples exist, each 
tuple will still have its unique identifier and, therefore, exactly the 
same rules can be used to map relational data under bag semantics. 
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APPENDIX 



A. ADDITIONAL REFERENCES FOR THE APPENDIX 
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B. ADDITIONAL OPERATORS IN RELATIONAL ALGEBRA 

It is important to notice that the operators left-outer join, right-outer join and full-outer join are all expressible with the previous operators. For 
example, assume that R and S are relation names such that att(R)r\att(S) = {Ai, A%, . . . , A^} and att(S)^att(R) = {Pi, B2, ■ ■ ■ , Be}, 
then the left-outer join for R and S is defined by the following expression: 

R M S U CTisNull(Ai)(-R) U °"laNull(yl2)(- R ) U ' ' ' U °"lsNull( A fe ) (#) U 

ill(Ai)(°'lsNatNull(A2)(' ' ' cr IsNotNull(A fc )( 7r {A 1 ,A 2 ,...,A fc }(-R)) ' ' ' )) \ 7T {A 1 ,A 2 ,...,A k }(S) 

NULL Bl M NULL B2 M — IX NULL B( , 

Similar expressions can be used to express the right-outer join and the full-outer join. 

C. SEMANTICS OF SPARQL 

Let P be a SPARQL graph pattern. In the rest of the paper, we use var(P) to denote the set of variables occurring in P. In particular, if t is a 
triple pattern, then var(t) denotes the set of variables occurring in the components of t. Similarly, for a built-in condition R, we use var(P) 
to denote the set of variables occurring in R. 

In what follows, we present the semantics of graph patterns for a fragment of SPARQL for which the semantics of nested SELECT 
queries is well understood [12 23 24 1 . More specifically, in what follows we focus on the class of graph patterns P satisfying the following 
condition: P is said to be non-parametric if for every sub-pattern Pi = (SELECT {7Ai AS 7Bi, . . . , ?A m AS lB m , 1C\, . . . , ?C„} P 2 ) 
of P and every variable IX occurring in P, if IX G (var(P2) \ {1Ai, . . . , ?A m , ?Ci, . . . , ?C n }), then IX does not occur in P outside 
Pi. 

To define the semantics of SPARQL graph pattern expressions, we need to introduce some terminology. A mapping fi is a partial function 
fj, : V — > (I U ). Abusing notation, for a triple pattern t we denote by n(t) the triple obtained by replacing the variables in t according 
to /i. The domain of fi, denoted by dom(/i), is the subset of V where fi is defined. Two mappings fii and fi2 are compatible, denoted by 
Mi ~ fJ>2, when for all x G dom(^ti) n dom(/i2), it is the case that /ii (x) = /x 2 (x), i.e. when /ii U /i2 is also a mapping. The mapping with 
empty domain is denoted by (j,q> (notice that this mapping is compatible with any other mapping). Given a mapping /i and a set of variables 
W, the restriction of /j, to W, denoted by fi\ w , is a mapping such that dom(jtii w ) = dom(^) n W and /ii = for every 

IX € dom(^i) (~1 W. Finally, given a mapping /j, and a sequence ?Ai, . . ., TA m , ?Bi, . . ., lB m of pairwise distinct elements from V such 
that dom(^) n {?Pi, . . . , ?S m } = 0, define p{? Al _ > ? Bl ?A m ->?s m } (A 4 ) as amapping such that: 



dom(p {?Al ^?B 1 ,,..,?A m ^fB m }(^)) = (dom(/i) x {?Ai, 
and for every a; £ dom(p {?Al ^ ?Sli ... i?Am ^ ?Bm} (^)): 

P{?A 1 ^?B 1 ,...,?A m ^?B m }(^)(x) 



,7A m }) U I i € {1, 



1} and 1Ai G dom(/i)}, 



(j,(?Ai) x —?Bi for some i G {1, . . . , m} 
/i(a;) otherwise 

We have all the necessary ingredients to define the semantics of graph pattern expressions. As in |16|, we define this semantics as a function 
[ • ]g that takes a graph pattern expression and returns a set of mappings. For the sake of readability, the semantics of filter expressions is 
presented separately. 

The evaluation of a graph pattern P over an RDF graph G, denoted by [PJg, is defined recursively as follows. 

1. If P is { } and G is nonempty, then [PJ G = If P is { } and G = 0, then [P] G = 0. 

2. If P is a triple pattern t, then [P]g = {j-i | dom(/i) = var(t) and fj,(t) G G}. 

3. If Pis (Pi AND P 2 ), then [P] G = {^1 U M 2 | Mi G [A]g, £ [P 2 ]g and mi ~ 

4. If P is (Pi OPT Pa), then [P] G = {/ii U /12 | /ii G [Pijo, € [P 2 ]g and /xi ~ ^2} U {fi G [Pi]g | for every n' G [P 2 ] G : 

5. If P is (Pi UNION P 2 ), then [P] G = { M | ^ G [Pi] G or ^ G [P 2 Jg}. 

6. If P is (Pi MINUS P 2 ), then \P\ G = {fi G [Pi] G I for every // G [P 2 ] G : M T 4 m' °r dom( A t) n dom( A t') = 0}. 

7. If P is (SELECT {?Ai AS ?Pi, . . . , ?4 m AS ?B m , ?Ci, . . . , ?C*„} Pi), then: 



{P{? 



?Ai->?S 1 ,...,?A„ 



J^UtA!,...,?^ 



■■,?C„}> 



A* G IA] G }. 



The semantics of filter expressions goes as follows. Given a mapping n and a built-in condition R, we say that [i satisfies R, denoted by 

fj, \= R, if: 
1. Pis bound(?A) and IX G dom(^); 



2. R is IX = c, IX G dom(/i) and /i(?X) = c; 

3. Pis IX =?y, ?X <E dom( M ), ?Y £ dom( M ) and ^(?X) = 

4. P is (->Pi), Pi is a built-in condition, and it is not the case that p, \= Ri; 

5. R is (R 1 V P2), Ri and R2 are built-in conditions, and n \= Ri or p |= P2; 

6. P is (Pi A R2), Ri and P2 are built-in conditions, /i |= Ri and /i |= P2. 
Then given an RDF graph G and a filter expression (P FILTER R): 

l(P FILTER P)] G = {m G Mg I A* (= R}- 

D. PROOFS 

D.l Proof of Theorem ffl 

We show that X>.M is information preserving by providing a computable mapping N : Q T that satisfies the condition in Definition 
[2] More precisely, given a relational schema R, a set E of PKs and FKs and an instance J of R satisfying E, next we should how M{G) is 
defined for VM(R, E, /) = G. 

• Step 1: Identify all the ontological class triples (i.e TRIPLE(r, "rdf:type", "owl : Class"))- The IRI r identifies an ontological 
class R' . For every R' that was retrieved from G, map it to a relation name R. 

• Step 2: Identify all the datatype triples of a given class (i.e TRIPLE(a, "rdf:type", "owl : DatatypeProperty "), TRIPLE(a, 
" rdf s : domain", t\)). The IRI a identifies the datatype property A' and the IRI r identifies the ontological class R' that is the domain 
of A'. Every datatype property A' with domain R' is mapped to an attribute A of relation name R. 

• Step 3: For each class R' and the datatype properties A[ . . . A' n that have domain R', we can recover the instances of relation R with 
the following SPARQL query: 



Qi = SELECT {?Ai, . . . , ?A n } 



{IX, "rdf : type", n) OPT (?X,ai,?Ai) OPT (?X,a 2 ,?A 



OPT {IX, a 3 ,?A 3 ) )■■■ OPT {?Xa n , ?A r , 



• Step 4: Identify all the object property triples (i.e. TRIPLE(r, "rdf :type", "owl : Ob jectProperty")). The IRI r that only has 
one element left of the # sign means that r identifies the object property R' in the ontology that was originally mapped from a binary 
relation. This object property R' is mapped back to a binary relation name R. The two elements following the # sign identify the 
attributes of the relation R. From the triples TRIPLE(r, "rdf s : domain", s) and TRIPLE(r, "rdf s : range", t), the IRI s identifies 
the ontological class S' which is mapped to the relation S and the IRI t identifies the ontological class T' which is mapped to the relation 
T. Additionally, the elements in the third and fourth position after the # identify the attributes which are being referenced from relations 
S and T respectively. For sake of simplicity, assume that the relation R references the attribute B of relation S which is mapped to a 
datatype property B' with domain S' and IRI b. Additionally, the relation R references the attribute C of relation T, which is mapped 
to a datatype property C with domain T" and IRI c. 

We can now recover the instances of the relation R with the following SPARQL query: 

Q* = (SELECT {7A 1 ,7A 2 } ((?Ti, r, ?T 2 ) AND (?Ti, b, ?Ai) AND (?T 2 , c, ?A 2 ))). 

• Step 5: Given that the result of a SPARQL query is a set f2 of solution mapping fi, we need to translate each solution mapping /j, G fl 
into a tuple t. We define a function tr^ 1 as the inverse of function tr, that is, for each solution mapping fi and variable ?A in the domain 
of fx, tr~ x assigns the value of to t. A. Then the mapping function M over G is defined as the following relational instance. For 
every non-binary relation name identified in Steps 1, 2, 3, define R U{G) as tr ([Qi]g)> and for every binary relation R identified in 
Step 4, define P AA(G) as tr~ 1 {lQ 2 jG)- 

It is straightforward to prove that for every relational schema R, set E of PKs and FKs and an instance / of R satisfying E, it holds that 
Af{M(R, E, /)) = /. This concludes the proof of the theorem. 

D.2 Proof of Theorem |2] 

We need to prove that for every relational schema R, set E of PKs and FKs over R and relational algebra query Q over R, there exists a 
SPARQL query Q* such that for every instance I of R including null values: 

tr{\Q\l) = lQ*h M (R,S,l)- 

In what follows, assume that R is a relational schema, E is a set of PKs and FKs over R, and I is an instance of R satisfying E. The 
following lemma is used in the proof of the theorem. 

Lemma 1 Let Q± be a relational algebra query over R such that att{Qi) = {A\ , . . . , Af }, and assume that Q\ is a SPARQL graph pattern 
such that: 

tr{{Qi\i) = [Qi]z-M(R,s,i)- 

Then we have that: 



HlQih) = [(SELECT {?Ai, . . . , ?Ai} Q*)It>m(R,s,i)- 



PROOF. First, we prove that fr([Qi]/) C [(SELECT {lAi, ?A e } Q*)1x>m(R,s,/)- Assume that fi E fr([Qi]/). Then there exists 
a tuple t E such that fr(t) = p. Thus, given that att(Qi) = {^li, . . . , Af}, we conclude that dom(fi) C {?Ai, . . . , ?Ae}. Given 

that fr([Qi]/) = [Qi]dai(r e i> we have that /j, E [Q*]x>ai(R e /)• Hence, from the fact that dom(^) C {?Ai, . . . , lAi }, we conclude 
that n E [(SELECT {1A U . '. . ', 7A e } Qi)1x>m(R,s,j). 

Second, we prove that [(SELECT Qi)]z>M(R,£,J) C Assume that ^ E 

[(SELECT {?Ai,...,?Af} Q*)]ua4(r,e,j). Then there exists a mapping fj,' E [Qi]x>M(R,s,/) such that M = /"| {7Al 7A } - 
From the fact that fr([Qi].r) = [Qi]da^(r,e j)i we conclude that [s! E fr([Qi]/). Thus, there exists a tuple t E [Qi]/ such that 
tr(t) — //. But then given that att(Q\) = {Ai, . . . , Ae}, we conclude by definition of tr that dom(//) C {?Ai, . . . , ?Ae}. Therefore, 
given that n = /uj , we have that /J. = j-i' and, hence, /i E ir(Ki]i) since /it' E *7*([Qi]i). 

I {? , . . . ,? Ag} 

□ 

We now prove the theorem by induction on the structure of relational algebra query Q. 

Base Case: For the sake of readability, we introduce a function v that retrieves the IRI for a given relation R, denoted by v{R), and the IRI 
for a given attribute A in a relation R, denoted by v{A, R). In this part of the proof, we need to consider three cases. 
• Non-binary relations: Assume that Q is the identity relational algebra query R, where R is a non-binary relation according to the 

definition given in Section |4~2l Moreover, assume that att(R) — {Ai, . . . , Ae}, with the corresponding IRIs v{R) = r, f{Ai, R) = 

ai, . . . , v{Ai, R) = ae. Finally, let Q* be the following SPARQL query: 

Q* = SELECT {lAi,...,?A t } 



{IX, " rdf : type ", r) OPT {IX, a u 1M) OPT {IX, a 2 , ?A 2 ) 

OPT {IX, a 3 , ?A 3 ) J • ■ • OPT {IX, a e , ?A e 



Next we prove that /r([Q]/) = [Q*Jx>ai(r,e,/) • 

First, we show that fr([Q]/) C IQ*Jt>m(r,s,i)- Assume that /i E /r([Q]/). Then there exists a tuple t E IQJi such that tr{i) — /land, 
hence, t E i? 7 . Without loss of generality, assume that there exists k E {0, ■■■,£} such that (1) t.Ai 7^ NULL for every i E {1, . . . fc}, 
and (2) t.Aj = NULL for every j E {A; + 1, ...,£}. By definition of fr, we have that t.yL = /i(?^4i) for every i E {1, . . . , fc}, and that 
dom(^) = {?A ly . . . , ?A k }. Given the definition of VM, we have that the following holds: CLASS(i?) and DTP(yL, R) for every 
i E {1, ...,£}. Hence, given that R is not a binary relation (that is, IsBlNREL(i?) does not hold), we have that the following triples are 
included in T>M(R,E, I): 

- {rid , " rdf : type " , r), where ru is the tuple id for the tuple t, and 

- {fid, o,i,Vi), where i E {1, . . . , k} and Vi is the value of attribute Ai in the tuple t, that is, t.Ai = Vi- 

Thus, given that no triple of the form {rid,aj,Vj) is included in XVW(R,£, I), for j E {k + 1,. ..,£}, we conclude that /1 E 
I < 3*Jx>A"1(r,ej) by definition of Q* and the fact that /j, — tr{t). 

Second, we show that \Q*\t>m{k,,t,,i) Q tr(fQJi)- Assume that /i E [Q*]d j m(r.s.j). Without loss of generality, assume that 
dom(/i) = {?^li, . . . , 7Ak}, where < k < I. Then by definition of Q* , we have that there exists an IRI r t d such that 2?A^(R, E, 7) 
contains triples {rid, " rdf : type ", r) and {r%d, ai, /z(?Ai)), for every i E {1, . . . ,k}, and it does not contain a triple of the form 
(nd, a,j,Vj), for every j E {k + 1, . . . ,£}. Given the definition of VM(R, E, 7) and the fact that IsBinRel(7?) does not hold, we 
conclude that there exists a tuple t E R 1 such that: (1) the IRI assigned by TDM to t is ru, (2) t.Ai — fi{?Ai) for every i E {1, . . . , fc}, 
and (3) t.A, = NULL for every j e {k + 1, . . . , £}. Thus, given that tr{i) = fj, and t E i? 7 , we conclude that fi E fr([Ql/) (recall that 

IQh = R 1 )- 

Binary relation: Assume that Q is the identity relational algebra query R, where R is a binary relation according to the definition given 
in Section |4~2l Moreover, assume that att{R) = {Ai, A2}, where Ai is a foreign key referencing the attribute B of a relation S, and 
A2 is a foreign key referencing the attribute C of a relation T. Finally, assume that u{R) = r, v{B, S) = 6 and v{C, T) — c, and 
define Q* as the following SPARQL 1.1 query: 

Q* = (SELECT {?At, 1A 2 } ((?Ti,r, ?T a ) AND (?Ti, 6, AND (?T 2 , c, ?A 2 ))). 

Next we prove that fr([Q]/) = [Q*]x>ai(r,e,/)- 

First, we show that f''([Q]/) C [Q*]-dai(r,e,/)- Assume that /i E ''"([QJ/). Then there exists a tuple £ E \Q\i such that rr(f) = fi 
and, hence, i E R 1 . Given the definition of mapping VM, we have that all the following hold: BlNREL(i?, Ai, Ai, S, B, T, C), 
VYL{Ai,A 2 ,R), VK 1 {A 1 ,R,B,S), FKi {A 2 , R, C, T), Class(S'), DTP(B,S'), Class(T), DTP(C,T), Rel(5), Attr(B,S'), 
Rel(T) and Attr(C, T). From this, we conclude that there exist tuples t\ E S 1 , t 2 E T 1 such that t.Ai = ti.B ^ NULL and 
= t 2 .C 7^ NULL, and we also conclude that the following triples are included in T>M(R, E, /): 

- {sid,r, tid) where Sj<j is the tuple id for tuple ti and tid is the tuple id for tuple t 2 , 

- {sid, b, Vi), where Vi is the value of attribute B in the tuple ti, that is, ti.B = Vi, 

- {tid, c, v 2 ), where v 2 is the value of attribute C in the tuple t 2 , that is, t 2 .C = v 2 . 

Given that t.Ai — ti.B — vi, t.A 2 = t 2 .C = v 2 and tr{t) — /i, we conclude by definition of Q* that /i E [Q*]dm(R,e,/)- 

Second, we show that [Q*]da<(r,s,7) ^ fr (I*31-f )• Assume that ^1 E [Q*]-dai(r.e,/). which implies that dom(/i) = {?Ai, ?A 2 }. By 
definition of Q* , we have that there exist IRIs s,d, ta such that the following triples are in DM(R., E, I): {s.id,r, tid), {sid, b, fi{?Ai)) 



and (tid, c, pi(?Aa)). Hence, by definition of VM, we have that there exist tuples t\ G S 1 , ti G T 1 such that: (1) s it i is the IRI 
assigned to ti by DM, (2) ti.B = pt(?j4i), (3) t id is the IRI assigned to ti by VM, and (4) *2.C = h(1A2). Moreover, we also have 
by definition of VM that the following holds: BinRel(_R, Ai,A 2 , S, B, T, C), FK 1 (A 1 , R, B, S) and FKi(A a , R, C, T). Hence, 
there exists tuple t G R 1 such that t.A\ = t\.B = pi(?Ai) and t.A 2 = t2.C = [i(lA?). Therefore, given that pi — tr(t) (since 
att(R) = {A lt A 2 } anddom(pi) = {1A 1 ,7A 2 }) and t G {QJi (since [Q] 7 = i? 7 ), we conclude that pi G fr([Q]/). 

• Third, assume that Q — NULLa, and let Q* be the SPARQL query { }. We have that [Q]j = {i}, where t is a tuple with domain {A} 
such that t.A = NULL. Moreover, we have that IQ*Jt>m(R,Ti,i) = {m} since VM(R., E, 7) is a nonempty RDF graph. Thus, given 
that tr(t) - fi®, we conclude that fr(|Q]j) = IQ*\t>m(r,s,i)- 

Inductive Step: Assume that the theorem holds for relational algebra queries Qi and Q 2 . That is, there exists SPARQL queries Q\ and Q 2 
such that: 

"UQlh) = [Ql]cM(R,B,7), 
KIQa]/) = [Q^M(R,S,7). 

To continue with the proof, we need to consider the following operators: selection (a), projection (it), rename (8), join (txi), union (U) and 
difference (\). 

• Selection: We need to consider four cases. 

- Case 1. Assume that Q = a Al=a (Qi), and Q* = {Q\ FILTER (?Ai = a)). Next we prove that ?r( [QJ / ) — [Q*1da4(r,s,/)- 
First, we show that ?r([<2]i") Q \Q*\t>m(r.h.i)- Assume that pi G tr(\Q\i). Then there exists atuple t G \Q\i such that tr(t) = pi. 
Thus, we have that i G [Qi]/ and t.Ai = a. By definition of tr, we know that t.Ai — pi(?Ai), from which we conclude that 
fj,(?Ai) — a given that t.Ai = a. Therefore, pi \= (?Ai = a), from which we conclude that pi G IQ*Jt>m(R,e,i) since pi = tr(t) 
and tr(t) G [Qi]x>ai(r.e./) by induction hypothesis. 

Second, we show that [Q*]z>m(R,£,-0 S Assume that pi G [Q*]z>m(R,s,/)- Then pt G [<3i]cm(R,e,-T) and M r= 

(?j4i = a), that is, n(?Ai) — a. By induction hypothesis, we have that pi G fr([Qi]i), and, hence, there exists a tuple t G 
such that tr(t) = pi. By definition of tr, we know that t.Ai = fj,(?Ai), from which we conclude that t.Ai = a given that 
pi(?Ai) = o. Given that t G [Qi]/ and t.A 1 = a, we have that t G [Q]j. Therefore, we conclude that pi G tr ([Q]/) since 
ft-(t) = pi. 

- Case 2. Assume that Q = cr yll / Q (Qi), and Q* = {Q* FILTER (->{7Ai = a) A bound(?Ai))). Next we prove that rr([Q]/) = 
[Q*]x>M(R,E,I)' 

First, we show that C [Q*]i»t(R,s,/)- Assume that pi G f([Q]i). Then there exists a tuple t G [Q]/ such that tr(t) = pi. 

Given that t G [Q]j, we have by the definition of the semantics of relational algebra that t G t.A 1 a and t.Ai 7^ NULL. 

Thus, by definition of tr we have that t.Ai = fi(?Ai) and pi(?Ai) ^ a. Hence, we have that pi |= (-i(?Ai = a) A bound(?j4i)), 
from which we conclude that /i G [Q*]x>x(R,E,i) since /i = and G [Q*l'DA<(R,E,i) by induction hypothesis. 
Second, we show that IQ*}t>m(r.z.i) Q tr{\Q\i). Assume that /i G [Q*]x>m(R,s,/). Then /i G [Qi]pai(r,s,;) and /x |= 
= a) A bound(?j4i)), that is, ?Ai G dom(/i) and ^(?Ai) 7^ a. By induction hypothesis we have that fi G fr([Qi]/) and, 
hence, there exists a tuple t G [Qi]z such that tr(t) — fi. Given that 1A\ G dom(/x) and fi(?Ai) 7^ a, it holds that t.Ai 7^ NULL 
and t.j4i 7^ a. Thus, we have that t G [Q]/, from which we conclude that /j, G fr([Q]]i) since = tr(t). 

- Case 3. Assume that Q = ct I3Nu u(a 1 )(Qi), and Q* = (Q* FILTER (-ibound(? J 4i))). Next we prove that ?r([Q]j) = 
IQ lx>A4(R,S,I)' 

First, we show that fr([Q]/) C [Q*]x>ai(r.e./)- Assume that pi G fr([Q]/). Then there exists a tuple t G fQJi such that fr(t) = /1. 
Given that t G fQJi, we have that t G [Qi]/ and t.Ai — NULL. Thus, we conclude by definition of tr that ?Ai dom(pt) and, 
hence, h (= -1 bound(?Ai). Therefore, we have that pi G [Q*]x>M(R,s,i) given that /x = fr(t) and tr(t) G [Qi] , da4(r,e,j) by 
induction hypothesis. 

Second, we show that [Q*]bm(r,e,j) C Assume that fi G IQ*1x>m(R,e,j)- Then fi G [<9*]x>ai(r,s,/) and pi |= 

(-1 bound(? J 4i)), that is, lAi dom(/x). By induction hypothesis we have that /1 G fr([Qi]/), from which we conclude that 
there exists a tuple t G [Qi]/ such that tr(t) = pi. By definition of tr and given that ?Ai dom(/i), we have that t.Ai = NULL 
and, hence, t G \Q\i- Therefore, we conclude that pi G fr([Q]/) since pi = tr(t). 

- Case 4. Assume that Q = cr IaNotNull ( Al )(Qi), and Q* = (Q\ FILTER (bound(?Ai))). Next we prove that tr(\Q\i) — 
B1bm(r,e,/)- 

First, we show that ir([Q]j) C \Q*\vm{r.t..i)- Assume that pi G Then there exists a tuple t G \Q\i such that 

tr(t) = pi. Given that t G IQji, we have that t G [Qi]/ and t.Ai / NULL. Thus, by definition of tr we have that ?Ai G dom(pi) 
and /j,(?Ai) — t.Ai and, hence, pi |= bound(?y4i). Therefore, we conclude that pi G [Q*Iz>A4(r,e,j) given that pi = fr(t) and 

G [QT]x>.M(R,E,i> 

Second, we show that [Q*1dm(r,e,/) £ Assume that pi G [Q*]x>M(R,s,i)- Then A 4 G [<9i1dm(r,e,/) and pi |= 

bound(?Ai), that is, ?Ai G dom(pi). By induction hypothesis we have that there exists a tuple t G [Qijj such that tr(t) = pi. 
Thus, by definition of tr we have that t.Ai = pi(?Ai), which implies that t.Ai 7^ NULL. Therefore, we have that t G [QI/ and, 
hence, pi G since pi = tr(t). 

• Projection: Assume that Q = 7T{a 1 ,...,a 4 }(Qi). and Q* = (SELECT Q*). Next we prove that tr{{Q\i) = 
[Q ]dm(R,s,/)- 

First, we show that C [Q*Jcai(R,e,/)- Assume that pi G fr([Q]/). Then there exists a tuple t G [Q]j such that tr(t) = pi. 

Given that t G [Q]j, there exists a tuple t' G [Qi]z such that for every A G att(Q) : t.A = t' .A. Without loss of generality, assume 
that: (1) att{Q) = {Ax, . . .,A k ,A k+1 , . . .,A t }, (2) t.Ai NULL for every i G {1, . . . ,k}, md(3)t.Aj = NULL for every j G {k + 



1, . . . , £}. By definition of tr, we have that t.Ai = p(7A{) for every i £ {1, . . . , fc}, and that dom(/i) = {?Ai, . . . , TA^}. Given that 
f E [Qi]j, we have for // = O-(t') that: (1) p' E fr(IQi]i), (2) dom(^) C dom(p'), O) dom(^) = ({?Ai, . . . , ?A £ } n dom(p')), 
and (4) f .A, = t' .Ai = /x(?A,j) = p'(?Ai) for every i E {1, . . . , fc}. Thus, we have in particular that: 

P = A»| {TAl 7A<} - (t) 

By induction hypothesis we have that p! E [Qi]|x>m(R,e,i)> from which we conclude that /if. ?A , A } E [Q*lx>A"i(R.E,i> Thus, we 
conclude from £j} that ^ E [Q*Jcai(r,s,/). 

Second, we show that [Q*]bm(R,e, i) S rr([Q]z). Assume that p E [Q*Jz>m(r,£,i)- Then there exists a mapping pi E [Qi]bm(R,e,j) 
such that /j = m{ {?a , a j- By induction hypothesis, we have that p E £7"([Qi]j), from we conclude that there exists a tuple 

t' E [Qi]z such that tr(t') = /j,'. Let t be a tuple with domain {Ai, . . . , At} such that t.Ai = t'.Ai for every i E {1, . . . , £}. Then, 
given that t! E [Qi]j, we have that t E [Qfl/, and given that p = tr(t') and /i = /i' {?A ?A } , we have that /x = fr(t). Therefore, 
we conclude that p E fr([Q]j). 

Rename: Assume that att(Q) = {A u . . . , A e } and Q = <5A 1 - ) .s I (Qi),andletQ* = (SELECT {?Ai AS ?Bi,?A 2 , .. . ,?A e } Q\). 
Next we prove that tr{ [Q]/) = KTJ-da-kr.e,/) ■ 

First, we show that fr([Q]/) C [Q*]x>.m(r,ej)- Assume that /x E rr([Q]/). Then there exists a tuple t E [Q]j such that rr(f) = /i. 
Given that t E [Q]j, there exists a tuple t! £ [Qi]/ such that t.Bi - t'.Ai and t.Ai = t'.Ai for every i E {2, . . . ,£}. Without loss 
of generality, assume that there exists fc E {1, . . . , £} such that: (1) t.Ai ^ NULL for every i E {2, . . . , fc}, and (2) t.Aj = NULL for 
every j E {fc + 1, ...,£}. To finish the proof, we consider two cases. 

- Assume that t.Bi ^ NULL. Then it follows from conditions (1), (2) and definition of tr that p(?Ai) = t.Bi — t'.Ai, pi^Ai) — 
t.Ai = t'.Ai for every i E {2, . . . , fc} and dom(p) = {?Ai, ?A 2 , ?A fe }. Let p! = tr(t'). Then by definition of tr, we 
have that P{ya 1 ^?b 1 }(p') = P- Moreover, given that p! = tr(t') and t' E [Qi]i, we conclude that p E fr([QiJ/) and, hence, 
p' G Ki]cm(r,ej) by induction hypothesis. Thus, we have that p { -? Al ^iB 1 }lp\ {1Ai ?A } ) E [Q*]x>M(R,s,/). from which 
we conclude that /x E [Q*]x>m(R,e,/) since v'\ {Al A } = /•*' and P{?a 1 ^?b 1 }(p') = P- 

- Assume that t.Bi — NULL. Then it follows from conditions (1), (2) and definition of tr that jx(?Aj) — t.Ai = t' .Ai for every 
i £ {2, . . . , fc} and dom(/i) = {?A2, ?A 2 , ■ ■ ■ , 7 At}. Let // = tr(t'). Then by definition of tr, we have that P{7a 1 ^>?b 1 }(p') = P- 
Moreover, given that p! = rr(f') and t' £ [Qi]j, we conclude that p! £ /r([Qi]/) and, hence, p! E [Qi]cm(R,e,i) by 
induction hypothesis. Thus, we have that p{iA 1 -f?B 1 }(p'\^ 7A ?A } ) E [Q*Jx>a<(r,e,i)> from which we conclude that p E 

[Q*1uai(r,e,/) since P| {Jll A } = M' a n dP{?A 1 ->?s 1 }(M / ) = M- 

Second, we show that KTJda-kr.e,/) C fr([QJj). Assume that p £ [Q*Iz>M(R,E,i> Then there exists a mapping /j' E [Q*]xim(R,e,j) 
such that /x = P{?Ai->?Bi}(ju|^ 74 ?A j )• By induction hypothesis, we have that p' E f/"(|Qi]/), from which we conclude that there 

exists a tuple t' E [Qi] j such that tr(t') = p . Let t be a tuple with domain {Bi, A2, ■ ■ ■ , Ae} such that t.Bi = t'.Ai and t.Ai = t'.Ai 
for every i E {2, . . . ,^}. Then we have that t £ [QJ/. Given that = tr(t') and /i = P{ , !A 1 ^'!b 1 }(p\ 1 ., a ?a } )> we have that 
/i = Therefore, we conclude that p E fr([Q]j). 

Join: Assume that Q = (Qi tx Q2), where (att(Qi) n att(Qa)) = {^4.1, ■ • ■ , Af}, and let 

Q* = ^Qi FILTER (bound(?Ai) A ■ ■ ■ A bound(?Af))^ AND (q* 2 FILTER (bound(?Ai) A • ■ • A bound(?A f )) 

Next we prove that fr([Q]/) = [Q*Jx>ai(r,e,/) ■ 
First, we show that fr([Q]/) C IQ*\t>m(r,s,i) ■ Assume that p E ^([QJ/). Then there exists a tuple t such that p = tr(t) and 
t E IQji. Thus, we have that there exist tuples ti £ {Qiji and t a E [Qa]i such that: (1) t.Ai = ti.Aj = t 2 -^i / NULL for 
every i E {1, ...,£}, (2) f.A = ti.A for every A £ (att(Qi) \ att(Q 2 )), and (3) t.A = t 2 .A for every A E (att(Q 2 ) \ att(Qi)). 
Let pi = fr(ti) and /12 = tr(t 2 ). By induction hypothesis and given that /ii £ fr([QiJ/) and p 2 £ tr(\Q 2 \i), we have that 
Mi £ [Qi]e>ai(r,e,/) an d M2 E \Q 2 \t>m(r„y.,i)- Hence, from condition (1) and definition of tr, we conclude that: 

Mi E [(Qi FILTER (bound(?Ai) A ■ • • A bound(?A,)))] I , jM(RiSi/) , 
P2 E \{Ql FILTER (bound(?Ai) A ■ ■ ■ A bound(?A £ )))] I , jM(RiSiJ) . 

Thus, given that p — pi U p 2 by conditions (1), (2), (3) and definition of tr, we conclude that p E |Q*]-dai(r,e,/)- 

Second, we show that |Q*]-dai(r,e,/) Q tr (lQ}i)- Assume that p £ [Q*]x>m(R,e,i). Then there exist mappings pi, p 2 

such that: (1) p = pi U p 2 , (2) pi E [(Qi FILTER (bound(?Ai) A A bound(?Af)))J- DA1{RiEi/) , and (3) p 2 £ 

[(Q2 FILTER (bound(?Ai) A ■•• A bound(?A^)))] x>JWCRiEiI ). By induction hypothesis, we have that pi £ fr([Qi]j) and 
£ tr(\Q 2 \i). Thus, there exist tuples ti E [Qi]/, t 2 £ [Q2]/ such that pi = tr(ti) and ^t 2 = tr(t 2 ). From conditions 
(1), (2), (3) and definition of tr, we have that t\.Ai = t 2 .Ai = p(?Ai) 7^ NULL for every i £ {1, ...,£}. Thus, given that 
(att(Qi) n att(Q 2 )) = {Ai, . . . ,A e }, we have that t E [QJ/, where t : (att(Qi) U att{Q 2 )) -> (DU {NULL}) such that: (4) 
t.Ai = ti.Ai - t 2 .A t for every i € {1, ...,£}, (5) t.A = ti.A for every A E (att(Qi) \ att{Q 2 )), and (6) t.A = t 2 -A for every 
A E (att(Q 2 ) \ att(Qi)). Hence, we conclude that /1 E fr([Q]/), given that /1 = fr(t) by definition of t, definition of tr and conditions 
(1), (2) and (3). 

Union: Assume that Q = (Qi U Q 2 ) and Q* = (Q* UNION Q|)- Next we prove that tr(\Q\i) = [Q*]x>m(R,e,/)- 

First, we show that fr([Qli) C [Q*]bm(R,s,i)' Assume that p E fr([Q]/). Then there exists a tuple f E [Q]j such that fr(i) = p. 

Thus, we have that t £ [QiJ/ or t E [Q 2 ]i. Without loss of generality, assume that t E [Qi]/. Then we have that tr(t) E fr([Qi]j) 



and, hence, tr(t) £ [Qi1i>ai(r,e,/) by induction hypothesis. Therefore, /i £ [Qi]-dai(r,e,i) since tr(t) = [i, from which we conclude 
that /j £ [Q*]x>m(R,s,j)- 

Second, we show that [Q*j-DM(n,s,i) C KIQl/)- Assume that a* <E [Q*1dm(R,e,j). Then M £ [<?i1dm(R,s,z) or At £ 
IQ^J'D.MfR.Ej)- Without loss of generality, assume that /i £ [Qi]r>.M(R,E,i)- Then, by induction hypothesis, we have that jj, £ 
/r([(3i]]/), and, hence, there exists a tuple t £ [Qi]/ such that tr(t) — /i. Therefore, we conclude that t £ [(Qi U Q2)]/, from which 
we deduce that n £ f?'([Q]/). 

Difference: Assume that Q — (Qi \ Q2), and that att(Q\) = att(Q2) = {Ai, . . . , Ae}. Then for every (not necessarily nonempty) 
set X = {ii, i2, . . . , i p } such that 1 < ii < 12 < . . . < i p < £, define Rx as the following filter condition: 



( bound(?A il ) A bound(?yl; 2 ) A ■ ■ ■ A bound(?yl; p ) A 



1 boundC?^ ) A -1 bound(?4 J2 ) A • ■ ■ A -1 bound(?A, ? ) 



where 1 < ji < j% < ■ ■ ■ < j q < I and {ji,ja, • • • , jq} = ({1, ■ • ■ , £} x {11,12, ■■■ ,ip})- That is, condition Rx indicates that every 
variables lAi with i £ X is bound, while every variable 7Aj with j £ ({1, ...,£} ^ X) is not bound. Moreover, for every X 7^ 
define SPARQL graph pattern P*- as follows: 

Px = ((Qi FILTER Rx) MINUS (Q2 FILTER Rx)). 

Notice that there are 2r — 1 possible graph patterns P* with A" 7^ 0. Let Pi, P2, . . ., P 2 f_i be an enumeration of these graph patterns. 
Moreover, assuming that ?X , ?Y, ?Z are fresh variables, let P be the following query: 

I FILTER P ) OPT ( \Ql FILTER P j AND (?X, ?Y, ?Z) 



FILTER (-1 bound(?X)) 

Then graph pattern Q* is defined as follows: 

Q* = (Pi UNION P 2 UNION ■ ■ • UNION P 2 *_i UNION P ). 

Next we show that fr([Q]i") = [Q*Jx>.m(r,ej)- In this proof, we assume, by considering LemmaQ] that for every mapping fi such that 
A* G I<?iIi3M(R,E,/) or £ [QOda-kr.e,/), it holds that dom(At) C {?j4i,. . . ,?^}- 

First, we show that fr([Q]i) C [Q*]:d.m(r,e,i)- Assume that /i £ fr([Q]i). Then there exists a tuple t £ [Q]/ such that 
tr(t) — (i. Thus, we have that t £ [Qi]/ and t [Q2J1, from which we conclude by considering the induction hypothesis that 
A* £ [Qi]z>M(R,s,j) and a* £ [QllcMtR.s,/)- We consider two cases to show that this implies that ^ £ [Q*]ba4(r,e,/)- 

- Assume that dom(/i) 7^ 0, and let X — {i £ |?j4, £ dom(At)}. Given that /j, £ [<5i]]r>A4(R,E,j)> we 
have that dom(At) C {?j4i, . . . , 1A{\ and, hence, A 7^ 0. Furthermore, we have that fi \= Rx and, hence, /1 £ 
KQi FILTER Rx)}t>m{r,s,i)- From this and the fact that fi [Qsil-DA-tfR.Ej), we conclude that: 

A« G [((Qi FILTER P*) MINUS (Qa FILTER PaO)1z>m(R,s,j)- (0 

To see why this is the case, assume that l[f} does not hold. Then given that a* £ [(Qi FILTER Pa , )]uai(r,e,/) i we conclude 
by definition of the operator MINUS that there exists a mapping p! £ [(Q2 FILTER Pa")]e>a<(r,e ,i) such that M ~ /u' and 
(dom(/i) n dom(/i')) 7^ 0. Given that // £ [QJJ-dakr.e.i). we have that dom(At') C {?Ai, .... ?^4f}. Thus, given that 
At' \= Rx and dom(n) C . . . , ?^lf}, we conclude that dom(/i) = dom(//). Therefore, given that n ~ /i', we have that 

At = A 4 ', from which we conclude that fi £ [Q^J-dakr.ej). leading to a contradiction. 

From g) and definition of Q*, we conclude that At £ [Q*li3M(R,E,/) since ((QJ FILTER P* ) MINUS (Q2 FILTER R x )) = 
P for some i £ {1, . . . , 2 l - 1} (recall that A / 0). 

- Assume that dom(Ai) = 0. Then we have that p \= Rq and, hence, fi £ [(Qi FILTER P )1i3ai(r,e,/)- From this and the fact 
that a* ^ [Q2]e>ai(r,e,/), we conclude that: 



H £ 



Qi FILTER P j OPT 
Q2 FILTER P ) AND {IX, ?Y, ?Z) 



FILTER (^bound(?X)) 



JX>M(R,S,J)- 



(*) 



To see why this is the case, assume that J*) does not hold. Then given that At £ [(Qi FILTER P )]e>a4(r,e,/) and dom(At) = 0, 
we have that there exists a mapping At' £ [((Q2 FILTER P ) AND (7X, ?Y, ?Z))]-dai(r,e ,i) such that ?A £ dom(At')- 
Thus, there exist mappings a*i £ [(Q2 FILTER P )]da4(r,e,/) and a*2 £ [(?A, ?Y, ?^)]x>m(R,s,j) such that At' = Mi U A l 2- 
Given that ^i £ [(Q2 FILTER P )]e>a4(r,e,7), we have that Ati £ [Q21dai(r,e,/) and a*i |= P . Thus, we have that 
dom(Aii) C {?j4i, . . . , 7 At }, from which we conclude that dom(Ati) = (since a*i |= P ). Therefore, we have that At = a*i. 
which implies that fi £ [Q2]dm(R,s,i) and leads to a contradiction. 
From ([*) and definition of Q* , we conclude that fi £ [Q*J-dai(r,e ,/)■ 
Second, we show that [Q*1d.m(r e /) ^ fr ([Q]-t)- Assume that At £ [Q*]r>.M(R e Then we consider two cases to prove that 

M G f(IQ]/)- 



- Assume that there exists i G {1, . . . ,1} such that /i G lPij-DM(n,s,r)- Then there exists X ^ such that fj, G \{{Ql FILTERS) MINUS (Q 2 F 
Thus, we have that /i G [Qi]-dai(r,e,j) and \= Rx, from which we conclude that C dom(/i) C {?Ai, . . . , ?Ae }. From this 

fact and definition of the MINUS operator, we obtain that /i IQ^Idahr.e./)- Hence, by induction hypothesis, we conclude that 
/i G ''"([IQil-r) and A 1 ^(Kz] /)• That is, there exists a tuple t such that tr(t) = fj,,t € [Qil/ and t G' IQ2]z, from which we 
conclude that fi G tr([Q]j). 

- Assume that © holds. First we show that \(Q 2 FILTER R<$)It>m(r,i:j) — 0- F° r the sa ke °f contradiction, assume that 
there exists a mapping // G \{Q 2 FILTER i?0)]r>x(R,E,/)- Then given that fi' G [Q2Jx>ai(r,e,/) and n \= R®, we con- 
clude that dom(fi') — 0. Given that T>A4(H, E, 7) is a nonempty RDF graph and dom(fi') = 0, we conclude that there 
exists a mapping fj," G [((<?£ FILTER R ) AND ?Y, ?^))Ix, M (R,B,/) such that dom(^") = {?X,?Y,?Z}. Thus, 
given that variables IX, 1Y , 1Z are not mentioned in (Q\ FILTER R$), we conclude that fx" is compatible with every map- 
ping in KQ* FILTER -R0)]x>a-i(r,e,i)- Thus, by definition of the OPT operator, we conclude that IX belongs to the do- 
main of every mapping in {{{Q* FILTER R $ ) OPT ((Q 2 FILTER R®) AND (IX, 1Y, ?Z)))] BM(R s n , which implies that 
[{{{Qt FILTER R ) OPT ({Q* 2 FILTER 7? ) AND (?X,?Y,?Z))) FILTER {-^bovtnd(?X)))J- DM ' iRi ^ I) = 0. But this 
leads to a contradiction, as we assume that (t*J holds. 

Given that © holds and [(Ql FILTER i?0)]x>M(R,s,/) = 0. we conclude that ^ G [(QJ FILTER R®)jT>M(R,-E,i) and /i 
KQ2 FILTER Rh)1-dm{r,s.j)- Hence, we have that i± G [Qi]dai(r,e,/) and fj, [Qa]i>A4(R,E,j) and, therefore, we conclude 
by induction hypothesis that /1 G ^([Qi]/) and /i fr([*92l-f )■ That is, there exists a tuple t such that fr(f) =/j,t£ [Qi]/ and 
t ^ [Qa]/, from which we conclude that /i G /r([Q]/). 

D.3 Proof of Proposition [1] 

Assume that we have a relational schema containing a relation with name STUDENT and attributes SID, NAME, and assume that the 
attribute SID is the primary key. Moreover, assume that this relation has two tuples, ti and t 2 such that ti.SID = 1, ti.NAME = John and 
£2 -SID = 1, £2 -NAME = Peter. It is clear that the primary key is violated, therefore the database is inconsistent. If VA4 would be semantics 
preserving, then the resulting RDF graph would be inconsistent under OWL semantics. However, the result of applying VA4, returns the 
following consistent RDF graph (assuming given a base IRI "http : / /example . edu/clb/ " for the mapping): 

TRIPLE("http : / /example . edu/db/ STUDENT", "rdf : type", "owl : Class") 

TRIPLE("http : / /example . edu/db/ STUDENT#NAME", "rdf : type", "owl : DatatypeProperty ") 
TRIPLE( "http : / /example .edu/db/ STUDENT#NAME", " rdf s : domain", "http : / /example . edu/db/ STUDENT") 
TRIPLE("http : / /example . edu/db/ STUDENT#S ID", "rdf : type", "owl : DatatypeProperty ") 
TRIPLE("http : / / example . edu/db/ STUDENT#S ID", " rdf s : domain", "http : / /example . edu/db/ STUDENT") 
TRIPLE("http : / /example .edu/db/ STUDENT#SID=1", "http :/ /example . edu/db/ STUDENT#NAME", "John") 
TRIPLE( "http : / /example .edu/db/ STUDENT#SID=1", "http : / /example . edu/db/ STUDENTtNAME", "Peter") 
TRIPLE( "http: / /example .edu/db/ STUDENT#SID=1", "http :/ /example . edu/db/ STUDENT#S ID", "1") 

Therefore, "DM is not semantics preserving. □ 

D.4 Proof of Proposition |2] 

It is straightforward to see that given a relational schema R, set S of (only) PKs over R and instance / of R such that / |= E, it holds 
that VA4 p k(H., E, /) is consistent under the OWL semantics. Likewise, if I y= E, then by definition of T>M p t, the resulting RDF graph will 
have an inconsistent triple TRIPLE(a, "owl : dif f erentFrom", a), which would generate an inconsistency under the OWL semantics. 

D.5 Proof of Theorem g] 

For the sake of contradiction, assume that M is a monotone and semantics preserving direct mapping. Then consider a schema R 
containing at least two distinct relation names Rx, R 2 , and consider a set E of PKs and FKs over R containing at least one foreign key 
from Ri to R 2 . Then we have that there exist instances Ji, I 2 of R such that I\ C I 2 , Ii does not satisfy E and I 2 does satisfy E. Given 
that M. is semantics preserving, we know that A^(R, E, 72) is consistent under the OWL semantics, while A^R, E, I\) is not. Given that 
M is monotone, we have that A4(R, E, Ii) C A / !(R, E, I 2 ). But then we conclude that A^R, E, ii) is also consistent under the OWL 
semantics, given that A^R, E, I 2 ) is consistent and A / ((R, E, Ji) C A / ((R, E, I 2 ), which leads to a contradiction. 

D.6 Proof of Theorem H 

It is straightforward to see that given a relational schema R, set E of PKs and FKs over R and instance I of R such that I \= E, it holds 
that T>Mpk+/k(R, E, 7) is consistent under the OWL semantics. Likewise, if I \£ E, then by definition of T>M P k+fk, the resulting RDF 
graph will contain an inconsistent triple TRIPLE(a, "owl : dif f erentFrom", a), which would generate an inconsistency under the OWL 
semantics. 



